CN110991279A - Document image analysis and recognition method and system - Google Patents

Document image analysis and recognition method and system Download PDF

Info

Publication number
CN110991279A
CN110991279A CN201911143272.6A CN201911143272A CN110991279A CN 110991279 A CN110991279 A CN 110991279A CN 201911143272 A CN201911143272 A CN 201911143272A CN 110991279 A CN110991279 A CN 110991279A
Authority
CN
China
Prior art keywords
document image
network
recognition
prediction
terminal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911143272.6A
Other languages
Chinese (zh)
Other versions
CN110991279B (en
Inventor
豆浩斌
陈博
朱风云
庞在虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Lingban Future Technology Co ltd
Original Assignee
Beijing Lingban Future Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Lingban Future Technology Co ltd filed Critical Beijing Lingban Future Technology Co ltd
Priority to CN201911143272.6A priority Critical patent/CN110991279B/en
Publication of CN110991279A publication Critical patent/CN110991279A/en
Application granted granted Critical
Publication of CN110991279B publication Critical patent/CN110991279B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Character Discrimination (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a document image analysis and recognition system, which comprises: the system comprises a user operation end, an interaction center, a process control end, a machine engine management end, a manual labeling management end, a machine terminal cluster and a manual terminal cluster; the user operation end, the process control end, the machine engine management end and the manual labeling management end are respectively connected to the interaction center; the machine engine management end is connected with the machine terminal cluster; and the manual marking management end is connected with the manual terminal cluster. In addition, the invention also discloses a document image analysis and identification method. The document image analysis and recognition system has the advantages of machine efficiency and manual accuracy, simple operation steps and reliable processing results are provided for users, and meanwhile, the human-computer coupling mode can play a teaching role on the machine in the continuous iteration process, so that the machine performance is gradually enhanced, and the manual participation degree is reduced.

Description

Document image analysis and recognition method and system
Technical Field
The invention relates to the technical field of document image analysis and identification, in particular to a document image analysis and identification method and system.
Background
Optical Character Recognition (OCR) is a technology that optically converts characters in a paper document into an image file of a pixel lattice, and converts the characters in the image into a text format through Recognition software for further editing and processing by word processing software.
Document Image Analysis And Recognition (DIAR) is a technology that analyzes the physical And logical structure of a Document Image by a computer vision method, And locates And recognizes each element (such as text, table, Image, graph, etc.) inside the Document, thereby forming a complete description of the Document.
A distributed software system is a software system that supports distributed processing, and is a system that performs tasks on a multiprocessor architecture interconnected by a communication network.
In the prior art, the prototype of the document image analysis and recognition technology is the traditional optical character recognition technology, the optical character recognition technology mainly processes and recognizes the text part in the document image, and with the gradual improvement of the software and hardware capabilities of a computer and the requirements of people on higher levels and more aspects of document image processing, more technologies related to document image processing are deeply researched, such as page segmentation, format analysis, chart analysis and the like, so that complete analysis and description of different levels of the document image are realized, and high-level functions of document retrieval, abstract generation, knowledge extraction and the like can be better completed. Current document image analysis and recognition systems typically include the following processing steps:
1. image preprocessing, including noise removal and distortion correction, to obtain a regular easy-to-process document image;
2. page segmentation, namely dividing a document image into a plurality of consistent areas such as texts, graphs, images, tables and the like;
3. analyzing the hierarchical structure relationship of the document image, including the relative position and spatial layout of the physical layer, and semantic labels such as headers, footers, titles, chapters, paragraphs, headers, icons and the like of the logical layer;
4. chart analysis, a chart is a structured and visualized strong information presentation mode. The chart analysis is to extract the structural information presented by the chart by analyzing the internal structure of the chart;
5. text positioning and recognition, namely determining the position information and the text content of a text in a document, and according to different processing algorithms, the method can be divided into the positioning and the recognition of text lines and the positioning and the recognition of single characters;
6. structural description and format conversion of the document. The document structure obtained by parsing is described, stored and transmitted in a specific format, and can be converted into a common document format, such as MS Word, PDF, HTML and the like.
However, the inventors have found through research that the document image analysis and recognition system in the prior art mainly has the following problems: the method comprises the following steps that firstly, the functions are incomplete, only a plurality of functions in document image analysis and identification can be provided, a plurality of types of objects are identified, and complete description of the hierarchical structure of the document image cannot be formed; secondly, the accuracy is not high, and higher identification accuracy cannot be guaranteed for document images with poor quality and complex formats; and thirdly, a perfect manual proofreading tool and service are lacked, and the use experience of a user is poor.
In addition, the prior art cannot provide a complete processing flow due to cost or efficiency, and lacks some processing steps, so that complete description of document information is difficult to obtain; in addition, the prior art scheme only provides a software tool, lacks functions of subsequent proofreading and verification and the like, needs a user to solve the problem additionally, and increases the use difficulty.
Disclosure of Invention
In view of the above problems, the present invention can provide a complete solution for analyzing and identifying a document image, generate a complete description of a hierarchical structure of the document image, and ensure efficiency and accuracy of the whole processing flow by a distributed man-machine coupling manner.
Based on this, a document image analysis and recognition method is especially provided, which comprises:
step 1, a message communication end of a document image analysis and recognition system receives a task initiating message sent by a user operation end, and the document image analysis and recognition system starts a document image analysis and recognition processing task;
step 2, acquiring a document image to be processed, inputting the document image to be processed into the document image analysis and recognition system, and acquiring basic information of the document to be processed;
step 3, performing page segmentation on the document image to be processed, simultaneously generating segmentation tasks of all page images in the document image to be processed in a message queue mode, sending the segmentation tasks to a machine engine terminal for executing the tasks through a machine engine management terminal, forwarding an initial page segmentation result obtained after the preprocessing of the machine engine terminal to a manual annotation terminal, and returning a final page segmentation result after the manual annotation to a process control terminal of the document image analysis and identification system for updating the page segmentation result;
step 4, obtaining initial information of table analysis processing after page segmentation processing is completed, simultaneously generating all table analysis tasks of the document image to be processed by adopting a message queue method, sending the table analysis tasks to a machine engine terminal for executing the tasks through a machine engine management terminal, forwarding an initial table analysis result obtained after the preprocessing of the machine engine terminal to a manual labeling terminal, and returning a final table analysis result after manual correction by a manual labeling operator to a process control terminal of the document image analysis and identification system for updating the table analysis result;
step 5, obtaining initial information of text detection after page segmentation processing and form analysis processing, simultaneously generating all text detection tasks of a document image to be processed in a message queue mode, sending the text detection tasks to a machine engine terminal executing the tasks through a machine engine management terminal, forwarding an initial text detection result obtained after preprocessing of the machine engine terminal to a manual labeling terminal, and returning a final text detection result after manual proofreading by a manual labeling person to a process control terminal of the document image analysis and recognition system for updating the text detection result;
step 6, obtaining initial information of text recognition after text detection is completed, generating all text recognition tasks of the document image to be processed in a message queue mode, sending the text recognition tasks to a machine engine terminal executing the tasks through a machine engine management terminal, forwarding an initial text recognition result obtained after the preprocessing of the machine engine terminal to a manual labeling terminal, and returning a final text recognition result after the manual correction of the manual labeling operator to a process control terminal of the document image analysis and recognition system for updating the text recognition result;
and 7, when the tasks of page segmentation, table analysis, text detection and text recognition of the document image to be processed are all completed, integrating the labeling results of different levels by the document image analysis and recognition system, and exporting the electronic document file.
In one embodiment, the acquiring the image of the document to be processed comprises acquiring a page image of the document to be processed by scanning or photographing, and recording basic information of the document to be processed, wherein the basic information comprises a name, an author, a publishing institution and a publishing date.
In one embodiment, the machine engine terminal runs a document image analysis and recognition model based on a deep neural network, determines the output of the model required to be called according to the current document image analysis and recognition processing steps, and returns the processing result to the machine engine management terminal.
In one embodiment, the deep neural network-based document image analysis and recognition model comprises an input layer, a feature extraction network, a multitask prediction network and a multitask output layer; the input layer is connected to the feature extraction network, the feature extraction network is connected to the multitask prediction network, and the multitask prediction network is connected to the multitask output layer;
the input layer receives an input page image, wherein the input page image is a page image in a document to be processed currently; the feature extraction network is a stacked multilayer convolutional neural network; the multi-task prediction network is a multi-layer prediction network which is specially used for corresponding tasks and is respectively constructed aiming at different prediction tasks; and the multitask output layer outputs output results of different prediction networks.
In one embodiment, the feature extraction network is a stacked multilayer convolutional neural network, each layer of convolutional neural network is a nonlinear mapping output by a previous layer of convolutional neural network, the input page image is represented and described through multiple times of nonlinear mapping, and the representation features are extracted and output; the representation characteristics of the page images acquired through the characteristic extraction network are shared characteristics which are shared by various prediction tasks; the multiple prediction tasks comprise page segmentation, table analysis, text detection and text identification;
the multi-task prediction network comprises a page segmentation prediction network, a table analysis prediction network, a text detection prediction network and a text recognition prediction network, and is respectively used for realizing the prediction tasks of page segmentation, table analysis, text detection and text recognition; the page segmentation prediction network, the table analysis prediction network, the text detection prediction network and the text recognition prediction network share input features, namely different prediction networks share representation features output by a feature extraction network; the multi-task prediction network determines different prediction network structures according to different prediction tasks respectively;
the multitask output layer comprises a page segmentation result output by the page segmentation prediction network, a table analysis result output by the table analysis prediction network, a text detection result output by the text detection prediction network and a text recognition result output by the text recognition prediction network.
In addition, in order to solve the technical problems in the prior art, the document image analysis and identification system is particularly provided, and comprises a user operation end, an interaction center, a process control end, a machine engine management end, a manual labeling management end, a machine terminal cluster and a manual terminal cluster;
the user operation end, the process control end, the machine engine management end and the manual labeling management end are respectively connected to the interaction center; the machine engine management end is connected with the machine terminal cluster; the manual marking management end is connected with the manual terminal cluster;
the interaction center comprises a data storage end and a message communication end; the data storage terminal is used for storing data uploaded by a user, a result after document image analysis and identification processing and data required by interaction between different modules and terminals in the document image analysis and identification system; the message communication terminal is used for establishing and completing message communication among all modules and terminals in the document image analysis and recognition system;
the user operation end is used for the user to perform system login, data uploading, task initiation, progress checking, result downloading and recharging payment operations;
the process control end is used for controlling a man-machine coupled document image analysis and recognition processing process and storing key data in the document image analysis and recognition processing process;
the machine terminal cluster comprises a plurality of machine engine terminals; the machine engine management end is used for managing and scheduling the machine engine terminals, determining operation steps by receiving information sent by the process control end and issuing operation tasks to corresponding execution terminals according to the running state of each current machine engine terminal; after the operation is finished, returning a corresponding message to the process control end;
the artificial terminal cluster comprises a plurality of artificial labeling terminals; the manual marking management end is used for managing and scheduling the manual marking terminals, determining operation steps by receiving information sent by the process control end, and issuing operation tasks to corresponding execution terminals according to the running states of the current manual marking terminals; and when the operation is finished, returning a corresponding message to the process controller.
In one embodiment, the man-machine coupled document image analysis and recognition processing flow comprises that the flow control terminal receives a task initiation message of a user operation terminal through the message communication terminal, so as to start a document image analysis and recognition processing flow; the process control end acquires the completed step of the current task, determines the next processing step and sends the step to a machine engine management end or a manual labeling management end through a message communication end; and after the processing flow is finished, the flow control end sends a message to the user operation end, so that the user operation end updates the current task finishing state.
In one embodiment, the machine terminal cluster includes a plurality of machine engine terminals; all machine engine terminals in the system are numbered uniformly, and are managed and allocated uniformly by a machine engine management end; the machine engine terminal runs a document image analysis and recognition model based on a deep neural network, determines the output of the model to be called according to the current document image analysis and recognition processing step, and returns the processing result to the machine engine management end;
the manual terminal cluster comprises a plurality of manual marking terminals, the manual marking terminals correspond to manual marking personnel, all the manual marking personnel and the corresponding manual marking terminals are numbered uniformly, and the manual marking terminals are managed and allocated uniformly by a manual marking management end; and the manual labeling operator checks and modifies the current labeling result in the manual labeling terminal, determines an operation page required to be called by the manual labeling terminal according to the current processing step, and returns the labeling result to the manual labeling management terminal.
In one embodiment, the deep neural network-based document image analysis and recognition model comprises an input layer, a feature extraction network, a multitask prediction network and a multitask output layer; the input layer is connected to the feature extraction network, the feature extraction network is connected to the multitask prediction network, and the multitask prediction network is connected to the multitask output layer;
the input layer receives an input page image, wherein the input page image is a page image in a document to be processed currently; the feature extraction network is a stacked multilayer convolutional neural network; the multi-task prediction network is a multi-layer prediction network which is specially used for corresponding tasks and is respectively constructed aiming at different prediction tasks; and the multitask output layer outputs output results of different prediction networks.
In one embodiment, the feature extraction network is a stacked multilayer convolutional neural network, each layer of convolutional neural network is a nonlinear mapping output by a previous layer of convolutional neural network, the input page image is represented and described through multiple times of nonlinear mapping, and the representation features are extracted and output; the representation characteristics of the page images acquired through the characteristic extraction network are shared characteristics which are shared by various prediction tasks; the multiple prediction tasks comprise page segmentation, table analysis, text detection and text identification;
the multi-task prediction network comprises a page segmentation prediction network, a table analysis prediction network, a text detection prediction network and a text recognition prediction network, and is respectively used for realizing the prediction tasks of page segmentation, table analysis, text detection and text recognition; the page segmentation prediction network, the table analysis prediction network, the text detection prediction network and the text recognition prediction network share input features, namely different prediction networks share representation features output by a feature extraction network; the multi-task prediction network determines different prediction network structures according to different prediction tasks respectively;
the multitask output layer comprises a page segmentation result output by the page segmentation prediction network, a table analysis result output by the table analysis prediction network, a text detection result output by the text detection prediction network and a text recognition result output by the text recognition prediction network.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
in the document image analysis and recognition processing method disclosed by the invention, each step has a process of machine initial judgment and manual verification, and the final processing result of each step is taken as the initial condition of the next step, so that the whole document image analysis and recognition processing system can have the efficiency of a machine and the accuracy of manual work. In the document image analysis and recognition processing system disclosed by the invention, manual work and machines are distributed on a plurality of nodes on the network, organic integration and communication are carried out through the process control end, the data storage end and the message communication end, and finally the document image analysis and recognition processing system is provided for users in a distributed network service mode, so that simple operation steps and reliable processing results are provided for the users. The man-machine coupling mode can generate teaching effect on the machine in the continuous iteration process, so that the performance of the machine is gradually enhanced, and the manual participation degree is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other schematic diagrams can be obtained according to the drawings without creative efforts;
wherein:
FIG. 1 is a schematic diagram of a deep neural network-based document image analysis and recognition model in the present invention;
FIG. 2 is a schematic diagram of a man-machine depth-coupled distributed document image analysis and recognition system according to the present invention;
FIG. 3 is a flowchart of a document image analysis and recognition method of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the invention, a document image analysis and recognition model based on a deep neural network is firstly constructed, the document image analysis and recognition model has multitask output, and results of a plurality of different processing stages can be simultaneously output. In order to avoid the increase of model complexity and calculation amount caused by multitask output, the document image analysis and identification model adopts a unique shared characteristic mode to improve the operation efficiency;
the deep neural network comprises a full connection layer, a convolution layer, a circulation connection layer, a pooling layer and a normalization layer;
the full connection layer is used for connecting each output node with all input nodes, so that the integral transformation of input characteristics is realized; the all-connection layer can be expressed as Fc (x; 'ch _ i,' ch _ o, g (∙)) = g (Wx + b), where x ∈ R [ ("ch _ i × 1) is the input feature vector, W ∈ R [ (" ch _ o × "ch _ i) is the weight, b ∈ R [ (" ch _ o × 1) is the offset, ch _ i is the number of channels that are input features, ch _ o is the number of channels that are output features, g (∙) is the activation function; the types of the activation function comprise five types of Linear, Sigmoid, Tanh, ReLU and SoftMax.
The convolution layer realizes local transformation processing on input characteristics in a mode of sharing local connection; the convolutional layers may be expressed as Conv (x; h _ k, W _ k, 'ch >,' o, 'sx,' k, 'sy,' ∙) = g (W x + b), where the symbol is a convolution operator, x is an R ^ h _ i × CH _ i, is an input feature map, W is an R ^ h _ k × CH \ o × CH _ i) is a convolution kernel weight, b is an offset, ch \ i is the number of channels of input features, ch \ o is the number of channels of output features, sx _ k is the horizontal shift of convolution kernel, and g is the vertical shift of activation function ∙.
The cyclic connection layer feeds back the output of the deep neural network as the input of the deep neural network, thereby realizing the feature extraction and transformation of the serialized signals; the cyclic link layer may be expressed as Rnn (x _ t, h _ (t-1);, ' ch _ i, ' ch _ o, g (∙)) = g (Wx _ t + Uh _ (t-1) + b), where x _ t ∈ R (' ch _ i × 1) is the input feature vector at time t, h _ (t-1) is the output feature vector at last time, W ∈ R ^ ch _ o ^ ch _ i) and U ∈ R ^ ([ (ch _ o x [ (ch _ o) are the mapping weights of the input feature at current time and the output feature at last time, respectively, b ∈ R ^ ch _ o × 1) is the offset, ch _ i is the number of channels of the input feature, and g is the number of channels of the output feature, ∙.
The pooling layer includes two types: maximum pooling layers and average pooling layers are determined. The pooling layers can be expressed as Pool (x; h _ k, w _ k, 'sx _ k,' sy _ k) which mainly realizes the aggregation and aggregation of local areas of input features, the maximum pooling layer is found to find the maximum value of the local areas, and the average pooling layer is found to find the average of the local areas. Where h _ k is the width of the pooled local area, w _ k is the height of the pooled local area, sx _ k is the horizontal moving step of the pooled area, sy is the vertical moving step of the pooled area.
The normalization layer realizes the normalization transformation of input characteristics by methods of mean variance normalization, batch normalization and the like; in particular, the method of batch normalization can be adopted to realize the normalized transformation of the input features.
As shown in FIG. 1, the deep neural network-based document image analysis and recognition model comprises an input layer 11, a feature extraction network 12, a multitask prediction network 13 and a multitask output layer 14; the input layer 11 is connected to the feature extraction network 12, the feature extraction network 12 is connected to the multitask prediction network 13, and the multitask prediction network 13 is connected to the multitask output layer 14;
the input layer 11 receives an input page image, wherein the input page image is a page image in a document to be processed currently;
the feature extraction network 12 is a multilayer convolutional neural network, each layer of network is a nonlinear mapping for the previous layer of network output, effective representation and description of an input page image are realized through multiple times of nonlinear mapping, and shared features are extracted and output;
specifically, the feature extraction network 12 employs a convolutional neural network including 13 convolutional layers and 1 pooling layer, the first layer of the feature extraction network is a convolutional layer with a parameter Conv (5,5,1,16,1,1, ReLU), and the next layer is a maximum pooling layer with a parameter Pool (3,3,2, 2); subsequently connected are 6 residual modules, which are added with cross-layer connections directly from input to output on the basis of normal multi-layer sequential connections, each residual module being composed of 2 convolutional layers, wherein the 2 convolutional layer parameters of the first residual module are Conv (3,3,16,32,2,2, 2, ReLU) and Conv (3,3,32,32,1,1, ReLU), the 2 convolutional layer parameters of the second residual module are Conv (3,3,32,64,2,2, ReLU) and Conv (3,3,64,64,1,1, ReLU), the 2 convolutional layer parameters of the third residual module are Conv (3,3,64,128,2,2, ReLU) and Conv (3,3,128,1, 1,1, ReLU), the 2 convolutional layer parameters of the fourth residual module are Conv (3,3,128,2, 2,2, 3,256,2, 2,3, 256, 3,2, 3, and ReLU, 3,256, 1,1, ReLU), the 2 convolutional layer parameters of the fifth residual module are Conv (3, 256,512,2,2, ReLU) and Conv (3, 512,1,1, ReLU), respectively, and the 2 convolutional layer parameters of the sixth residual module are Conv (3, 512,1024,2,2, ReLU) and Conv (3, 1024,1,1, ReLU), respectively; the shared features are representative features of page images acquired through the feature extraction network 12, and the representative features are shared by a plurality of prediction tasks; the multiple prediction tasks comprise page segmentation, table analysis, text detection and text identification;
in particular, in order to realize multi-scale feature description of an input image, output features of different layers of the feature extraction network are scaled to a uniform size by image interpolation processing, and convolution layers with parameters Conv (1,1, [ (Ch) o, [ (Ch) i,1,1, ReLU) are added to convert the output channel number thereof to a specific size, where [ (Ch) o is the channel number of the input features and [ (Ch) i is the channel number of the output features; for the output characteristics of 6 residual modules in the characteristic extraction network, respectively converting the output characteristics of each residual module into the number of output channels (Ch) with a value of & lt _ i =32, and then splicing the output channels in the dimension of 6 × 32=192 to finally obtain the shared characteristics of the output channels;
the multi-task prediction network 13 is a task-specific multi-layer prediction network which is respectively constructed for different tasks, and comprises a page segmentation prediction network, a table analysis prediction network, a text detection prediction network and a text recognition prediction network which are respectively used for realizing prediction tasks of page segmentation, table analysis, text detection and text recognition; the page segmentation prediction network, the table analysis prediction network, the text detection prediction network and the text recognition prediction network share the same input features, namely share the shared features output by the feature extraction network; determining different prediction network structures of the tasks according to respective characteristics of the tasks;
in particular, the multi-layer predictive network of the multitasking predictive network 13 shares the same input features.
Wherein the page division prediction network comprises a layer of convolutional network Conv (3, 192,5,1,1, SoftMax), wherein ch (o = 5) indicates that the page area comprises 5 types of background, text, image, table and division line;
the table analysis prediction network predicts the position and the orientation of the table line, adopts a layer of convolution network, and has the structure of Conv (3, 192,2,1,1, Sigmoid), wherein ch (o = 2) represents the position and the orientation of the table line 2 predicted values;
the text detection prediction network predicts the position and orientation information of the character line, and the adopted convolution network structure can be expressed as Conv (3, 192,6,1,1, Sigmoid), wherein ch (ch) o =6 represents the probability score, the four upper, lower, left and right frame positions and the overall orientation of the character line, and 6 predicted values.
The text recognition and prediction network needs to firstly convert the characteristics of corresponding text line areas into sequence characteristics with unified characteristic dimensions through a space transformation network according to the position and orientation information of the text lines, describe and depict the sequence relationship through adding a cyclic network Rnn (192,256, Tanh), and finally add a layer of convolutional network to obtain a final prediction result, wherein the structure of the convolutional network is Conv (1,1,256, CharNum,1,1, SoftMax), wherein [ ch _ o = CharNum ] is the number of character categories to be recognized;
wherein, the multi-task output layer 14 outputs the output results of different task prediction networks, that is, the page segmentation prediction network outputs the page segmentation result, the table parsing prediction network outputs the table parsing result, the text detection prediction network outputs the text detection result, and the text identification prediction network outputs the text identification result; the output result is a final result or an intermediate result of the prediction task, and the intermediate result is subjected to post-processing to obtain a final result; constraint relationships exist between different output results.
In the operation process, the shared characteristics only need to be calculated once and cached in the data storage end. Under the condition of given sharing characteristics, different task prediction networks are relatively independent, the task prediction network to be operated is determined according to the task message of the process control end, and a corresponding prediction result is obtained.
In order to sufficiently train the deep neural network-based document image analysis and recognition model, the automatic document image generation method based on program synthesis is adopted to generate an image and simultaneously generate annotation information corresponding to a plurality of output results required by model training. The document image analysis and recognition model constructed by the automatic document image generation method based on program synthesis and the machine engine running the model can quickly and completely analyze the input document image to be processed.
In the invention, a distributed document image analysis and recognition system based on man-machine depth coupling is shown in FIG. 2.
The document image analysis and recognition system comprises a user operation end 21, an interaction center 22, a process control end 23, a machine engine management end 24, a manual labeling management end 25, a machine terminal cluster 26 and a manual terminal cluster 27;
the user operation end 21, the process control end 23, the machine engine management end 24, and the manual labeling management end 25 are respectively connected to the interaction center 22; the machine engine management end 24 is connected with the machine terminal cluster 26; the manual labeling management terminal 25 is connected with the manual terminal cluster 27;
the interaction center 22 comprises a data storage end 221 and a message communication end 222; the data storage end 221 is used for storing data uploaded by a user, processed results and data required by interaction between different modules and terminals; the message communication terminal 222 is used for establishing and completing message communication between each module and terminal in the document image analysis and recognition system;
the user operation end 21 is used for the user to perform operations of system login, data uploading, task initiation, progress checking, result downloading, recharging and paying;
the process control end 23 is a processing center of the document image analysis and recognition system, and is used for controlling a human-computer coupled document image analysis and recognition processing process and storing key data in the processing process;
specifically, the process control end 23 receives a task initiation message from the user operation end 21 through the message communication end 222, so as to start a document image analysis and identification processing flow; the process control end 23 obtains the completion step of the current task, determines the next processing step, and sends the step to the machine engine management end 24 or the manual labeling management end 25 through the message communication end 222; after the whole processing flow is completed, the flow control end 23 sends a message to the user operation end 21, so that the user operation end 21 updates the current task completion state;
wherein the machine terminal cluster 26 comprises a plurality of machine engine terminals; all machine engine terminals are numbered uniformly and managed and allocated uniformly by a machine engine management end 26; running a document image analysis and recognition model based on a deep neural network in the machine engine terminal, determining the output of the model to be called according to the current processing step, and returning the processing result to the machine engine management end;
the machine engine management terminal 24 is configured to manage and schedule the machine engine, determine the next operation by receiving information sent by the process controller, and issue an operation task to a corresponding execution terminal according to the current running state of each machine engine terminal; when the operation is finished, returning a corresponding message to the process controller;
wherein the artificial terminal cluster 27 comprises a plurality of artificial labeling terminals; the manual marking terminals correspond to manual marking personnel, all the manual marking personnel and the corresponding manual marking terminals are numbered uniformly, and are managed and allocated uniformly by a manual marking management end 25; checking and modifying the current labeling result in the manual labeling terminal, determining an operation page to be called according to the current processing step, and returning the labeling result to the manual labeling management terminal 25;
the manual labeling management terminal 25 is configured to manage and schedule the manual labeling terminals, determine the next operation by receiving information sent by the process controller 23, and issue an operation task to the corresponding execution terminal according to the current running state of each manual labeling terminal; when the operation is completed, a corresponding message is returned to the process controller 23.
As shown in fig. 3, in order to improve the stability of the system, the present invention further provides a document image analysis and recognition processing method with a deep human-machine coupling, so that the system can combine the efficiency of the machine and the accuracy of the human being, the document image analysis and recognition processing method includes the following steps:
step 1, a message communication end receives a task initiating message of a user operation end, so that a document image analysis and recognition processing task is started;
step 2, acquiring a document image to be processed, inputting the document image to be processed into the document image analysis and recognition system, and acquiring basic information of the document to be processed;
specifically, acquiring an image of a document to be processed includes acquiring a page image of the document to be processed by scanning or photographing, and recording basic information of the document to be processed, where the basic information includes a name, an author, a publishing organization, and a publishing date;
step 3, performing page segmentation on the document to be processed, simultaneously generating segmentation tasks of all page images in a message queue mode, sending the segmentation tasks to a machine engine, processing the initial result by the machine engine, then transferring the initial result to a manual labeling system, and returning the result after manual proofreading to a process control end for updating the page segmentation result;
step 4, obtaining initial information of table analysis after page segmentation is completed, simultaneously generating all table analysis tasks by adopting a message queue method, sequentially performing pretreatment and manual proofreading by a machine engine, and returning a final result to update a table analysis result;
step 5, after the page segmentation and the form analysis are completed, initial information of text detection can be obtained, all text detection tasks are generated in a message queue mode, and are subjected to machine engine preprocessing and manual proofreading in sequence, and a final result is returned to update a text detection result;
step 6, obtaining initial information of text recognition after text detection is completed, generating all text recognition tasks in a message queue mode, sequentially performing machine engine preprocessing and manual proofreading, and returning a final result to update a text recognition result;
and 7, when all processing tasks of the document image to be processed are finished, integrating labeling results of different levels and exporting the electronic document file with a specific format.
The process control end receives a task initiating message of a user operation end through the message communication end, so that a document image analysis and recognition processing process is started; the process control end acquires the completed step of the current task, determines the next processing step and sends the step to the machine engine management end 24 or the manual labeling management end 25 through the message communication end; after the processing flow is completed, the flow control end sends a message to the user operation end 21, so that the user operation end 21 updates the current task completion state.
Wherein the machine terminal cluster 26 comprises a plurality of machine engine terminals; all machine engine terminals in the system are numbered uniformly, and are managed and allocated uniformly by a machine engine management end 24; the machine engine terminal runs a document image analysis and recognition model based on a deep neural network, determines the output of the model to be called according to the current document image analysis and recognition processing steps, and returns the processing result to the machine engine management terminal 24;
the manual terminal cluster 27 comprises a plurality of manual labeling terminals, the manual labeling terminals correspond to manual labeling personnel, all the manual labeling personnel and the corresponding manual labeling terminals are numbered in a unified manner, and the manual labeling management terminal 25 manages and allocates the manual labeling terminals in a unified manner; the manual labeling operator checks and modifies the current labeling result in the manual labeling terminal, determines an operation page required to be called by the manual labeling terminal according to the current processing step, and returns the labeling result to the manual labeling management terminal 25.
As shown in fig. 1, the deep neural network-based document image analysis and recognition model includes an input layer 11, a feature extraction network 12, a multitask prediction network 13 and a multitask output layer 14; the input layer 11 is connected to the feature extraction network 12, the feature extraction network 12 is connected to the multitask prediction network 13, and the multitask prediction network 13 is connected to the multitask output layer 14;
the input layer 11 receives an input page image, wherein the input page image is a page image in a document to be processed currently; the feature extraction network is a stacked multilayer convolutional neural network; the multi-task prediction network is a multi-layer prediction network which is specially used for corresponding tasks and is respectively constructed aiming at different prediction tasks; and the multitask output layer outputs output results of different prediction networks.
The feature extraction network 12 is a stacked multilayer convolutional neural network, each layer of convolutional neural network is a nonlinear mapping output by a previous layer of convolutional neural network, the input page image is represented and described through multiple times of nonlinear mapping, and the representation features are extracted and output; the representation characteristics of the page images acquired through the characteristic extraction network are shared characteristics which are shared by various prediction tasks; the multiple prediction tasks comprise page segmentation, table analysis, text detection and text recognition.
The multi-task prediction network 13 comprises a page segmentation prediction network, a table analysis prediction network, a text detection prediction network and a text recognition prediction network, and is respectively used for realizing the prediction tasks of page segmentation, table analysis, text detection and text recognition; the page segmentation prediction network, the table analysis prediction network, the text detection prediction network and the text recognition prediction network share input features, namely different prediction networks share representation features output by a feature extraction network; the multi-task prediction network 13 determines its different prediction network structures according to different prediction tasks, respectively.
The multitask output layer 14 includes a page partition result output by the page partition prediction network, a table parsing result output by the table parsing prediction network, a text detection result output by the text detection prediction network, and a text recognition result output by the text recognition prediction network.
In the method for analyzing and identifying the document image by human-computer deep coupling provided by the invention, the human-computer deep coupling means that a process of initial machine judgment and manual verification exists for each step in the document image analyzing and identifying process, and the final processing result of each step is used as the initial condition of the next step.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
the invention adopts a deep neural network model based on multi-task shared characteristics as a machine engine of a document image analysis and recognition system, and the model can provide analysis and recognition results of each hierarchy of the document image through multiple outputs. In actual operation, the shared features of a certain page image are cached in the system, and all the hierarchical tasks for executing the page image can directly load the cached features without repeated calculation.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims (10)

1. A document image analysis and recognition method is characterized by comprising the following steps:
step 1, a message communication end of a document image analysis and recognition system receives a task initiating message sent by a user operation end, and the document image analysis and recognition system starts a document image analysis and recognition processing task;
step 2, acquiring a document image to be processed, inputting the document image to be processed into the document image analysis and recognition system, and acquiring basic information of the document to be processed;
step 3, performing page segmentation on the document image to be processed, simultaneously generating segmentation tasks of all page images in the document image to be processed in a message queue mode, sending the segmentation tasks to a machine engine terminal for executing the tasks through a machine engine management terminal, forwarding an initial page segmentation result obtained after the preprocessing of the machine engine terminal to a manual annotation terminal, and returning a final page segmentation result after the manual annotation to a process control terminal of the document image analysis and identification system for updating the page segmentation result;
step 4, obtaining initial information of table analysis processing after page segmentation processing is completed, simultaneously generating all table analysis tasks of the document image to be processed by adopting a message queue method, sending the table analysis tasks to a machine engine terminal for executing the tasks through a machine engine management terminal, forwarding an initial table analysis result obtained after the preprocessing of the machine engine terminal to a manual labeling terminal, and returning a final table analysis result after manual correction by a manual labeling operator to a process control terminal of the document image analysis and identification system for updating the table analysis result;
step 5, obtaining initial information of text detection after page segmentation processing and form analysis processing, simultaneously generating all text detection tasks of a document image to be processed in a message queue mode, sending the text detection tasks to a machine engine terminal executing the tasks through a machine engine management terminal, forwarding an initial text detection result obtained after preprocessing of the machine engine terminal to a manual labeling terminal, and returning a final text detection result after manual proofreading by a manual labeling person to a process control terminal of the document image analysis and recognition system for updating the text detection result;
step 6, obtaining initial information of text recognition after text detection is completed, generating all text recognition tasks of the document image to be processed in a message queue mode, sending the text recognition tasks to a machine engine terminal executing the tasks through a machine engine management terminal, forwarding an initial text recognition result obtained after the preprocessing of the machine engine terminal to a manual labeling terminal, and returning a final text recognition result after the manual correction of the manual labeling operator to a process control terminal of the document image analysis and recognition system for updating the text recognition result;
and 7, when the tasks of page segmentation, table analysis, text detection and text recognition of the document image to be processed are all completed, integrating the labeling results of different levels by the document image analysis and recognition system, and exporting the electronic document file.
2. The document image analysis and recognition method of claim 1,
acquiring the image of the document to be processed comprises acquiring a page image of the document to be processed in a scanning or photographing mode, and recording basic information of the document to be processed, wherein the basic information comprises a name, an author, a publishing organization and a publishing date.
3. The document image analysis and recognition method of claim 1,
the machine engine terminal runs a document image analysis and recognition model based on a deep neural network, determines the output of the model required to be called according to the current document image analysis and recognition processing steps, and returns the processing result to the machine engine management end.
4. The document image analysis and recognition method of claim 3,
the document image analysis and recognition model based on the deep neural network comprises an input layer, a feature extraction network, a multitask prediction network and a multitask output layer; the input layer is connected to the feature extraction network, the feature extraction network is connected to the multitask prediction network, and the multitask prediction network is connected to the multitask output layer;
the input layer receives an input page image, wherein the input page image is a page image in a document to be processed currently; the feature extraction network is a stacked multilayer convolutional neural network; the multi-task prediction network is a multi-layer prediction network which is specially used for corresponding tasks and is respectively constructed aiming at different prediction tasks; and the multitask output layer outputs output results of different prediction networks.
5. The document image analysis and recognition method of claim 4,
the feature extraction network is a plurality of layers of superposed convolutional neural networks, each layer of convolutional neural network is a nonlinear mapping output by the previous layer of convolutional neural network, the input page image is represented and described through the nonlinear mapping for a plurality of times, and the representation features are extracted and output; the representation characteristics of the page images acquired through the characteristic extraction network are shared characteristics which are shared by various prediction tasks; the multiple prediction tasks comprise page segmentation, table analysis, text detection and text identification;
the multi-task prediction network comprises a page segmentation prediction network, a table analysis prediction network, a text detection prediction network and a text recognition prediction network, and is respectively used for realizing the prediction tasks of page segmentation, table analysis, text detection and text recognition; the page segmentation prediction network, the table analysis prediction network, the text detection prediction network and the text recognition prediction network share input features, namely different prediction networks share representation features output by a feature extraction network; the multi-task prediction network determines different prediction network structures according to different prediction tasks respectively;
the multitask output layer comprises a page segmentation result output by the page segmentation prediction network, a table analysis result output by the table analysis prediction network, a text detection result output by the text detection prediction network and a text recognition result output by the text recognition prediction network.
6. A document image analysis and recognition system is characterized by comprising a user operation end, an interaction center, a process control end, a machine engine management end, a manual labeling management end, a machine terminal cluster and a manual terminal cluster;
the user operation end, the process control end, the machine engine management end and the manual labeling management end are respectively connected to the interaction center; the machine engine management end is connected with the machine terminal cluster; the manual marking management end is connected with the manual terminal cluster;
the interaction center comprises a data storage end and a message communication end; the data storage terminal is used for storing data uploaded by a user, a result after document image analysis and identification processing and data required by interaction between different modules and terminals in the document image analysis and identification system; the message communication terminal is used for establishing and completing message communication among all modules and terminals in the document image analysis and recognition system;
the user operation end is used for the user to perform system login, data uploading, task initiation, progress checking, result downloading and recharging payment operations;
the process control end is used for controlling a man-machine coupled document image analysis and recognition processing process and storing key data in the document image analysis and recognition processing process;
the machine terminal cluster comprises a plurality of machine engine terminals; the machine engine management end is used for managing and scheduling the machine engine terminals, determining operation steps by receiving information sent by the process control end and issuing operation tasks to corresponding execution terminals according to the running state of each current machine engine terminal; after the operation is finished, returning a corresponding message to the process control end;
the artificial terminal cluster comprises a plurality of artificial labeling terminals; the manual marking management end is used for managing and scheduling the manual marking terminals, determining operation steps by receiving information sent by the process control end, and issuing operation tasks to corresponding execution terminals according to the running states of the current manual marking terminals; and when the operation is finished, returning a corresponding message to the process controller.
7. The document image analysis and recognition system of claim 6,
the man-machine coupled document image analysis and recognition processing flow comprises that the flow control end receives a task initiating message of a user operation end through the message communication end, so as to start a document image analysis and recognition processing flow; the process control end acquires the completed step of the current task, determines the next processing step and sends the step to a machine engine management end or a manual labeling management end through a message communication end; and after the processing flow is finished, the flow control end sends a message to the user operation end, so that the user operation end updates the current task finishing state.
8. The document image analysis and recognition system of claim 6,
the machine terminal cluster comprises a plurality of machine engine terminals; all machine engine terminals in the system are numbered uniformly, and are managed and allocated uniformly by a machine engine management end; the machine engine terminal runs a document image analysis and recognition model based on a deep neural network, determines the output of the model to be called according to the current document image analysis and recognition processing step, and returns the processing result to the machine engine management end;
the manual terminal cluster comprises a plurality of manual marking terminals, the manual marking terminals correspond to manual marking personnel, all the manual marking personnel and the corresponding manual marking terminals are numbered uniformly, and the manual marking terminals are managed and allocated uniformly by a manual marking management end; and the manual labeling operator checks and modifies the current labeling result in the manual labeling terminal, determines an operation page required to be called by the manual labeling terminal according to the current processing step, and returns the labeling result to the manual labeling management terminal.
9. The document image analysis and recognition system of claim 8,
the document image analysis and recognition model based on the deep neural network comprises an input layer, a feature extraction network, a multitask prediction network and a multitask output layer; the input layer is connected to the feature extraction network, the feature extraction network is connected to the multitask prediction network, and the multitask prediction network is connected to the multitask output layer;
the input layer receives an input page image, wherein the input page image is a page image in a document to be processed currently; the feature extraction network is a stacked multilayer convolutional neural network; the multi-task prediction network is a multi-layer prediction network which is specially used for corresponding tasks and is respectively constructed aiming at different prediction tasks; and the multitask output layer outputs output results of different prediction networks.
10. The document image analysis and recognition system of claim 9,
the feature extraction network is a plurality of layers of superposed convolutional neural networks, each layer of convolutional neural network is a nonlinear mapping output by the previous layer of convolutional neural network, the input page image is represented and described through the nonlinear mapping for a plurality of times, and the representation features are extracted and output; the representation characteristics of the page images acquired through the characteristic extraction network are shared characteristics which are shared by various prediction tasks; the multiple prediction tasks comprise page segmentation, table analysis, text detection and text identification;
the multi-task prediction network comprises a page segmentation prediction network, a table analysis prediction network, a text detection prediction network and a text recognition prediction network, and is respectively used for realizing the prediction tasks of page segmentation, table analysis, text detection and text recognition; the page segmentation prediction network, the table analysis prediction network, the text detection prediction network and the text recognition prediction network share input features, namely different prediction networks share representation features output by a feature extraction network; the multi-task prediction network determines different prediction network structures according to different prediction tasks respectively;
the multitask output layer comprises a page segmentation result output by the page segmentation prediction network, a table analysis result output by the table analysis prediction network, a text detection result output by the text detection prediction network and a text recognition result output by the text recognition prediction network.
CN201911143272.6A 2019-11-20 2019-11-20 Document Image Analysis and Recognition Method and System Active CN110991279B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911143272.6A CN110991279B (en) 2019-11-20 2019-11-20 Document Image Analysis and Recognition Method and System

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911143272.6A CN110991279B (en) 2019-11-20 2019-11-20 Document Image Analysis and Recognition Method and System

Publications (2)

Publication Number Publication Date
CN110991279A true CN110991279A (en) 2020-04-10
CN110991279B CN110991279B (en) 2023-08-22

Family

ID=70085461

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911143272.6A Active CN110991279B (en) 2019-11-20 2019-11-20 Document Image Analysis and Recognition Method and System

Country Status (1)

Country Link
CN (1) CN110991279B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111898411A (en) * 2020-06-16 2020-11-06 华南理工大学 Text image labeling system, method, computer device and storage medium
CN114881992A (en) * 2022-05-24 2022-08-09 北京安德医智科技有限公司 Skull fracture detection method and device and storage medium

Citations (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04124785A (en) * 1990-09-17 1992-04-24 Hitachi Ltd Confirmation and correction method for ocr recognition result
GB0031596D0 (en) * 2000-12-22 2001-02-07 Barbara Justin S A system and method for improving accuracy of signal interpretation
JP2002024758A (en) * 2000-07-07 2002-01-25 Hitachi Ltd Input data determining method in document handling device
GB0318214D0 (en) * 2002-08-23 2003-09-03 Hewlett Packard Development Co Systems and methods for processing text-based electronic documents
AU2005201754A1 (en) * 2005-04-27 2006-11-16 Canon Kabushiki Kaisha Method of extracting data from documents
US20060288279A1 (en) * 2005-06-15 2006-12-21 Sherif Yacoub Computer assisted document modification
US20070033118A1 (en) * 2005-08-02 2007-02-08 Taxscan Technologies, Llc Document Scanning and Data Derivation Architecture.
CN101013440A (en) * 2007-01-12 2007-08-08 王宏源 Method for constructing digital library based on book knowledge element
CN101118597A (en) * 2006-07-31 2008-02-06 富士通株式会社 Form processing method, form processing device, and computer product
US20080118110A1 (en) * 2006-11-22 2008-05-22 Rutger Simonsson Apparatus and method for analyzing image identifications generated by an ocr device
CN101441713A (en) * 2007-11-19 2009-05-27 汉王科技股份有限公司 Optical character recognition method and apparatus of PDF document
CN101539929A (en) * 2009-04-17 2009-09-23 无锡天脉聚源传媒科技有限公司 Method for indexing TV news by utilizing computer system
CN101542504A (en) * 2006-09-08 2009-09-23 谷歌公司 Shape clustering in post optical character recognition processing
CN102289667A (en) * 2010-05-17 2011-12-21 微软公司 User correction of errors arising in a textual document undergoing optical character recognition (OCR) process
CN102663138A (en) * 2012-05-03 2012-09-12 北京大学 Method and device for inputting formula query terms
US20130330008A1 (en) * 2011-09-24 2013-12-12 Lotfi A. Zadeh Methods and Systems for Applications for Z-numbers
CN104123550A (en) * 2013-04-25 2014-10-29 魏昊 Cloud computing-based text scanning identification method
CN105159870A (en) * 2015-06-26 2015-12-16 徐信 Processing system for precisely completing continuous natural speech textualization and method for precisely completing continuous natural speech textualization
CN107369440A (en) * 2017-08-02 2017-11-21 北京灵伴未来科技有限公司 The training method and device of a kind of Speaker Identification model for phrase sound
CN107943937A (en) * 2017-11-23 2018-04-20 杭州源诚科技有限公司 A kind of debtors assets monitoring method and system based on trial open information analysis
CN108170658A (en) * 2018-01-12 2018-06-15 山西同方知网数字出版技术有限公司 A kind of flexibly configurable, the Text region flexibly defined adapt critique system
CN108537146A (en) * 2018-03-22 2018-09-14 五邑大学 A kind of block letter mixes line of text extraction system with handwritten form
CN109165293A (en) * 2018-08-08 2019-01-08 上海宝尊电子商务有限公司 A kind of expert data mask method and program towards fashion world
CN109255113A (en) * 2018-09-04 2019-01-22 郑州信大壹密科技有限公司 Intelligent critique system
CN109543614A (en) * 2018-11-22 2019-03-29 厦门商集网络科技有限责任公司 A kind of this difference of full text comparison method and equipment
CN109685052A (en) * 2018-12-06 2019-04-26 泰康保险集团股份有限公司 Method for processing text images, device, electronic equipment and computer-readable medium
CN109840519A (en) * 2019-01-25 2019-06-04 青岛盈智科技有限公司 A kind of adaptive intelligent form recognition input device and its application method
CN109934227A (en) * 2019-03-12 2019-06-25 上海兑观信息科技技术有限公司 System for recognizing characters from image and method
CN110298032A (en) * 2019-05-29 2019-10-01 西南电子技术研究所(中国电子科技集团公司第十研究所) Text classification corpus labeling training system
CN110321875A (en) * 2019-07-19 2019-10-11 东莞理工学院 A kind of resume identification and intelligent classification screening system based on deep learning
CN110378332A (en) * 2019-06-14 2019-10-25 上海咪啰信息科技有限公司 A kind of container terminal case number (CN) and Train number recognition method and system

Patent Citations (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04124785A (en) * 1990-09-17 1992-04-24 Hitachi Ltd Confirmation and correction method for ocr recognition result
JP2002024758A (en) * 2000-07-07 2002-01-25 Hitachi Ltd Input data determining method in document handling device
GB0031596D0 (en) * 2000-12-22 2001-02-07 Barbara Justin S A system and method for improving accuracy of signal interpretation
GB0318214D0 (en) * 2002-08-23 2003-09-03 Hewlett Packard Development Co Systems and methods for processing text-based electronic documents
AU2005201754A1 (en) * 2005-04-27 2006-11-16 Canon Kabushiki Kaisha Method of extracting data from documents
US20060288279A1 (en) * 2005-06-15 2006-12-21 Sherif Yacoub Computer assisted document modification
US20070033118A1 (en) * 2005-08-02 2007-02-08 Taxscan Technologies, Llc Document Scanning and Data Derivation Architecture.
CN101118597A (en) * 2006-07-31 2008-02-06 富士通株式会社 Form processing method, form processing device, and computer product
CN101542504A (en) * 2006-09-08 2009-09-23 谷歌公司 Shape clustering in post optical character recognition processing
US20080118110A1 (en) * 2006-11-22 2008-05-22 Rutger Simonsson Apparatus and method for analyzing image identifications generated by an ocr device
CN101013440A (en) * 2007-01-12 2007-08-08 王宏源 Method for constructing digital library based on book knowledge element
CN101441713A (en) * 2007-11-19 2009-05-27 汉王科技股份有限公司 Optical character recognition method and apparatus of PDF document
CN101539929A (en) * 2009-04-17 2009-09-23 无锡天脉聚源传媒科技有限公司 Method for indexing TV news by utilizing computer system
CN102289667A (en) * 2010-05-17 2011-12-21 微软公司 User correction of errors arising in a textual document undergoing optical character recognition (OCR) process
US20130330008A1 (en) * 2011-09-24 2013-12-12 Lotfi A. Zadeh Methods and Systems for Applications for Z-numbers
CN102663138A (en) * 2012-05-03 2012-09-12 北京大学 Method and device for inputting formula query terms
CN104123550A (en) * 2013-04-25 2014-10-29 魏昊 Cloud computing-based text scanning identification method
CN105159870A (en) * 2015-06-26 2015-12-16 徐信 Processing system for precisely completing continuous natural speech textualization and method for precisely completing continuous natural speech textualization
CN107369440A (en) * 2017-08-02 2017-11-21 北京灵伴未来科技有限公司 The training method and device of a kind of Speaker Identification model for phrase sound
CN107943937A (en) * 2017-11-23 2018-04-20 杭州源诚科技有限公司 A kind of debtors assets monitoring method and system based on trial open information analysis
CN108170658A (en) * 2018-01-12 2018-06-15 山西同方知网数字出版技术有限公司 A kind of flexibly configurable, the Text region flexibly defined adapt critique system
CN108537146A (en) * 2018-03-22 2018-09-14 五邑大学 A kind of block letter mixes line of text extraction system with handwritten form
CN109165293A (en) * 2018-08-08 2019-01-08 上海宝尊电子商务有限公司 A kind of expert data mask method and program towards fashion world
CN109255113A (en) * 2018-09-04 2019-01-22 郑州信大壹密科技有限公司 Intelligent critique system
CN109543614A (en) * 2018-11-22 2019-03-29 厦门商集网络科技有限责任公司 A kind of this difference of full text comparison method and equipment
CN109685052A (en) * 2018-12-06 2019-04-26 泰康保险集团股份有限公司 Method for processing text images, device, electronic equipment and computer-readable medium
CN109840519A (en) * 2019-01-25 2019-06-04 青岛盈智科技有限公司 A kind of adaptive intelligent form recognition input device and its application method
CN109934227A (en) * 2019-03-12 2019-06-25 上海兑观信息科技技术有限公司 System for recognizing characters from image and method
CN110298032A (en) * 2019-05-29 2019-10-01 西南电子技术研究所(中国电子科技集团公司第十研究所) Text classification corpus labeling training system
CN110378332A (en) * 2019-06-14 2019-10-25 上海咪啰信息科技有限公司 A kind of container terminal case number (CN) and Train number recognition method and system
CN110321875A (en) * 2019-07-19 2019-10-11 东莞理工学院 A kind of resume identification and intelligent classification screening system based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张文国: "OCR数字化加工系统研发成功 为图书、档案、文献资料数字化提供先进技术手段", no. 04 *
张文国: "实现图书、档案、文献资料数字化的先进技术――OCR数字化加工系统", no. 09 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111898411A (en) * 2020-06-16 2020-11-06 华南理工大学 Text image labeling system, method, computer device and storage medium
CN114881992A (en) * 2022-05-24 2022-08-09 北京安德医智科技有限公司 Skull fracture detection method and device and storage medium

Also Published As

Publication number Publication date
CN110991279B (en) 2023-08-22

Similar Documents

Publication Publication Date Title
WO2021047286A1 (en) Text processing model training method, and text processing method and apparatus
WO2021190451A1 (en) Method and apparatus for training image processing model
WO2021164772A1 (en) Method for training cross-modal retrieval model, cross-modal retrieval method, and related device
CN111368993B (en) Data processing method and related equipment
CN109783666B (en) Image scene graph generation method based on iterative refinement
WO2021022521A1 (en) Method for processing data, and method and device for training neural network model
WO2022068627A1 (en) Data processing method and related device
JP7412847B2 (en) Image processing method, image processing device, server, and computer program
CN110852256B (en) Method, device and equipment for generating time sequence action nomination and storage medium
WO2023160472A1 (en) Model training method and related device
CN111930894B (en) Long text matching method and device, storage medium and electronic equipment
CN111098312A (en) Window government affairs service robot
CN109712108B (en) Visual positioning method for generating network based on diversity discrimination candidate frame
CN114092707A (en) Image text visual question answering method, system and storage medium
WO2024098533A1 (en) Image-text bidirectional search method, apparatus and device, and non-volatile readable storage medium
WO2023093724A1 (en) Neural network model processing method and device
WO2024041479A1 (en) Data processing method and apparatus
WO2024098524A1 (en) Text and video cross-searching method and apparatus, model training method and apparatus, device, and medium
CN110991279B (en) Document Image Analysis and Recognition Method and System
CN114995729A (en) Voice drawing method and device and computer equipment
CN114169408A (en) Emotion classification method based on multi-mode attention mechanism
CN114490922A (en) Natural language understanding model training method and device
CN115292439A (en) Data processing method and related equipment
CN117033609A (en) Text visual question-answering method, device, computer equipment and storage medium
CN110197521B (en) Visual text embedding method based on semantic structure representation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant