CN117152768A - Off-line identification method and system for scanning pen - Google Patents

Off-line identification method and system for scanning pen Download PDF

Info

Publication number
CN117152768A
CN117152768A CN202311005379.0A CN202311005379A CN117152768A CN 117152768 A CN117152768 A CN 117152768A CN 202311005379 A CN202311005379 A CN 202311005379A CN 117152768 A CN117152768 A CN 117152768A
Authority
CN
China
Prior art keywords
text
recognition
model
image
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311005379.0A
Other languages
Chinese (zh)
Inventor
刘福星
周业明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Simware Telecommunication Technology Co ltd
Original Assignee
Guangzhou Simware Telecommunication Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Simware Telecommunication Technology Co ltd filed Critical Guangzhou Simware Telecommunication Technology Co ltd
Priority to CN202311005379.0A priority Critical patent/CN117152768A/en
Publication of CN117152768A publication Critical patent/CN117152768A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19147Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/142Image acquisition using hand-held instruments; Constructional details of the instruments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/16Image preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/16Image preprocessing
    • G06V30/1607Correcting image deformation, e.g. trapezoidal deformation caused by perspective
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/16Image preprocessing
    • G06V30/164Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/26Techniques for post-processing, e.g. correcting the recognition result
    • G06V30/262Techniques for post-processing, e.g. correcting the recognition result using context analysis, e.g. lexical, syntactic or semantic context

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Character Discrimination (AREA)

Abstract

The invention relates to the technical field of an off-line identification method of a reading pen, in particular to an off-line identification method and an off-line identification system for the reading pen, comprising the following steps of: a large number of written and handwritten text samples are collected and preprocessed by an image processing enhancement function to generate a text dataset. In the invention, various sample data are collected through data acquisition and preprocessing work, the problems of paper distortion, blurring and shading are eliminated by using an image processing technology, a personalized recognition model is constructed by using deep learning and a neural network, customized training is carried out according to own writing style or terms of specific industries, a context perception recognition module is integrated, context information such as a document structure, a paragraph, a title and the like is extracted, the recognition accuracy and semantic understanding are enhanced, language and character set support are continuously expanded, the requirements of users in different areas are met, finally, the recognition accuracy is improved by optimizing a recognition algorithm and the model, and a reliable recognition result is obtained when complex typesetting and deformed text is processed.

Description

Off-line identification method and system for scanning pen
Technical Field
The invention relates to the technical field of an off-line identification method of a reading pen, in particular to an off-line identification method and an off-line identification system for the reading pen.
Background
A scanning pen, which is a device for offline recognition, can be used to scan text or images and convert them into electronic text or image form. The off-line recognition method specifically adopts an optical character recognition OCR technology, can recognize printed characters and converts paper texts into texts which can be recognized by a computer. The stylus is equipped with a local recognition model and algorithm and has offline storage capability. When using a stylus for recognition, the clarity and scan angle of the text may be noted to obtain better recognition results.
In the existing offline identification method of the scanning pen, difficulties exist in processing complex paper with the problems of distortion, blurring, shading and the like, and the accuracy of identification can be affected, so that the result is unreliable or wrong. The existing methods generally cannot meet specific requirements of individual users, such as personalized training of writing styles, recognition of specific industry terms, and the like. The lack of personalized customization limits the breadth and depth of application of the stylus in different users and industries. Most existing methods focus only on recognizing the text itself, and lack a comprehensive understanding of the context information. This limits the association of recognition results with the overall semantics of the document, affecting the accuracy of recognition and semantic understanding capabilities. Some methods may only support some common languages and character sets, limiting the applicability of the stylus in different locales and language environments. And, although OCR and handwriting recognition techniques are continually improving, the recognition accuracy of existing methods is still unstable when dealing with complex typesets and morphed text. This may lead to errors or imperfections in the recognition results, affecting the user's experience of use.
Disclosure of Invention
The invention aims to solve the defects in the prior art, and provides an off-line identification method and an off-line identification system for a scanning pen.
In order to achieve the above purpose, the present invention adopts the following technical scheme: the off-line identification method for the scanning pen comprises the following steps of:
collecting a large number of written and handwritten text samples, and preprocessing the written and handwritten text samples through an image processing enhancement function to generate a text data set;
based on the text data set, constructing an identification model by adopting deep learning and neural network technology, and providing a model customized training tool for the identification model;
after the recognition of the scanning pen is completed based on the recognition model, extracting and analyzing the context information to obtain a recognition result;
continuously collecting and processing sample data comprising language and character sets, and gradually expanding the application range of the recognition model;
optimizing the recognition model improves recognition accuracy in special cases including complex typesetting, blurring or morphed text.
As a further aspect of the present invention, the step of collecting a plurality of written and handwritten text samples, preprocessing the samples by an image processing enhancement function, and generating a text data set specifically includes:
Collecting a large number of written and handwritten text image samples, covering different fonts, languages, character sets, colors, paper textures, as an initial dataset;
based on the initial data set, performing image processing technologies including image distortion correction, image denoising and image enhancement, completing preprocessing work, and generating corresponding processed image samples;
and labeling the processed image sample to record the position of the text region and the corresponding character label information, and integrating to generate the text data set.
As a further scheme of the invention, the image distortion correction is specifically implemented by searching key points or edges in the image sample, correcting deformation and distortion of the image by applying a perspective transformation matrix, performing perspective transformation by using four corner points, and adjusting the positions of the corner points to enable text lines to be more linear and parallel;
the image denoising is specifically to smooth an image by combining the space distance between pixels and the similarity between pixel values, and simultaneously, preserve edge information so as to reduce noise in the image sample;
the image enhancement is specifically that a histogram equalization method is adopted, and the histogram of the image is uniformly distributed by transforming the pixel value distribution of the image sample, so that the contrast and detail of the image sample are enhanced.
As a further scheme of the invention, the steps of constructing the recognition model by adopting the deep learning and neural network technology are specifically as follows:
dividing the text data set into a training set, a verification set and a test set according to the proportion of 14:3:3;
extracting image features by adopting an image feature extractor, and converting the training set, the verification set and the test set into corresponding digital features;
establishing an identification model, and designing a CNN architecture with a plurality of convolution layers and pooling layers for extracting characteristics of a text image;
selecting a cross entropy loss function, and calculating the gradient of the loss function to the identification model parameters by using a back propagation algorithm;
updating the recognition model parameters to minimize a loss function using an optimization algorithm, in particular a random gradient descent, limiting the complexity of the recognition model using regularization techniques;
and selecting super parameters by a cross-validation method, wherein the super parameters comprise learning rate, batch size and hidden unit number.
As a further aspect of the present invention, the step of providing a model customization training tool specifically includes:
selecting PyTorch as a deep learning framework based on the recognition model, and creating a basic structure of a training framework;
In the infrastructure, providing data loading functions including reading and preprocessing a text dataset;
in the basic structure, a model building module is created, a user is allowed to define a self-defined network structure which comprises a convolution layer, a circulation layer and a full connection layer, and parameter initialization is carried out;
adding training and evaluating core code logic in the infrastructure, including optimization flows of forward propagation, loss calculation, back propagation and parameter update;
in the infrastructure, providing a user interface allowing a user to specify and adjust super parameters of a model, providing command line parameters for defining and modifying components in the recognition model, and integrating as a training framework;
in the training framework, a visual function is provided for monitoring the index and curve change in the training process, and various index and curve changes in the training process are monitored.
As a further scheme of the invention, after the recognition of the scanning pen based on the recognition model is completed, the steps of extracting and analyzing the context information and obtaining the recognition result are specifically as follows:
the scanning pen acquires image data, the image data is input into the recognition model, and a recognized text result is acquired;
Dividing the text result based on layout information comprising text lines and paragraphs to generate paragraphs or text lines, applying a character division algorithm to the handwritten text result, separating characters from the text result, integrating the above contents, and obtaining context information;
based on the context information and a language model, identifying the language to which the text result belongs;
and performing context analysis based on the context information and improving the recognition result.
As a further aspect of the present invention, the context analysis includes semantic analysis, named entity recognition, and context error correction;
the semantic analysis specifically comprises the steps of performing semantic analysis on the context information of a text result by using a natural language processing technology comprising part-of-speech tagging and dependency syntactic analysis, and identifying a grammar structure and semantic role information of the context information;
the named entity identification is specifically to identify named entities including person names, place names and institutions in the text results;
the context error correction is specifically to correct possible recognition errors in the text result through a language model or context similarity detection based on the context information;
The improved recognition result comprises context correction and word splicing;
the context correction is specifically to correct an error or uncertainty part in the text result by using context information;
the word stitching is specifically to recombine the segmented characters into words for the handwritten text result.
As a further aspect of the present invention, the step of continuously collecting and processing sample data including language and character sets and gradually expanding the application range of the recognition model specifically includes:
determining the required language and character set range, and preparing corresponding supplementary text sample data;
preprocessing the supplementary text sample data through the image processing enhancement function to generate a supplementary text data set;
based on the supplementary text data set, carrying out adaptation and expansion work of the recognition model, and expanding the application range;
the adaptation and expansion work comprises initializing model weight, transferring learning, model training and data enhancement;
the initialization model weight is specifically that a pre-trained weight or a random initialization weight is used on the basis of the identification model;
the transfer learning is specifically that an existing pre-training model is used as a basic model, and a new language and character set are adapted through fine adjustment;
The model training is specifically to train and optimize the recognition model by adopting a cyclic neural network based on the supplementary text data set;
the data enhancement adopts an image enhancement technology and a text enhancement technology;
the image enhancement techniques include rotation, translation, scaling, and the text enhancement techniques include replacing characters, inserting noise.
As a further aspect of the present invention, the step of optimizing the recognition model to improve recognition accuracy in a special case including a complicated typesetting, blurring, or morphing text specifically includes:
the receptive field and the feature extraction capability of the model are improved by adding deeper convolution layers in the identification model and adjusting the size and the number of convolution kernels;
the attention introducing mechanism is used for enhancing the attention of the recognition model to the important text area, and the attention weight is utilized to enhance the recognition of the complex typesetting or deformed text;
an end-to-end training strategy is adopted, and the capability of the recognition model for resisting special interference is improved;
using input images with different sizes to capture features with different levels of detail, thereby improving the recognition accuracy of the recognition model on complex typesetting and deformed text;
And introducing auxiliary tasks including character position detection and text line segmentation into the recognition model, providing context information and auxiliary constraint, and improving the overall recognition effect.
The offline recognition system for the scanning pen consists of a data collection module, a model construction module, a character recognition module and a model optimization module;
the data collection module comprises a text collection sub-module, an image processing sub-module and a labeling sub-module;
the model building module comprises a data set dividing sub-module, a feature extraction sub-module, a model design sub-module, a model training sub-module and a super parameter selection sub-module;
the text recognition module comprises a text recognition sub-module, a context extraction sub-module and a context analysis sub-module;
the model optimization module comprises a data expansion sub-module, an adaptation expansion sub-module and an identification model optimization sub-module.
Compared with the prior art, the invention has the advantages and positive effects that:
in the invention, various sample data are collected through data acquisition and preprocessing work, and the problems of paper distortion, blurring and shading are eliminated by using an image processing technology. The deep learning and neural network are utilized to construct a personalized recognition model, so that a user is allowed to conduct customized training according to own writing style or terms of specific industries. And the context information such as document structure, paragraphs, titles and the like is extracted by integrating the context perception recognition module, so that the recognition accuracy and semantic understanding are enhanced. And continuous extension language and character set support can meet the requirements of users in different areas. Finally, the recognition accuracy is improved by optimizing the recognition algorithm and the model, and a reliable recognition result is obtained when complex typesetting and deformed text are processed.
Drawings
FIG. 1 is a schematic diagram of a workflow of an off-line identification method and system for a stylus according to the present invention;
FIG. 2 is a detailed flowchart of step 1 of the method and system for offline identification of a stylus according to the present invention;
FIG. 3 is a detailed flow chart of part of the flow of step 2 of the method and system for offline identification of a stylus according to the present invention;
FIG. 4 is a flowchart illustrating another part of the process refinement of step 2 of the method and system for offline identification of a stylus according to the present invention;
FIG. 5 is a step 3 refinement flowchart of the method and system for offline identification of a stylus of the present invention;
FIG. 6 is a detailed flowchart of step 4 of the method and system for offline identification of a stylus of the present invention;
FIG. 7 is a detailed flowchart of step 5 of the method and system for offline identification of a stylus of the present invention;
fig. 8 is a schematic diagram of a system frame of the method and system for offline identification of a stylus according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
In the description of the present invention, it should be understood that the terms "length," "width," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like indicate orientations or positional relationships based on the orientation or positional relationships shown in the drawings, merely to facilitate describing the present invention and simplify the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and therefore should not be construed as limiting the present invention. Furthermore, in the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.
Example 1
Referring to fig. 1, the present invention provides a technical solution: the off-line identification method for the scanning pen comprises the following steps of:
collecting a large number of written and handwritten text samples, and preprocessing the written and handwritten text samples through an image processing enhancement function to generate a text data set;
based on the text data set, constructing an identification model by adopting deep learning and neural network technology, and providing a model customized training tool for the identification model;
After the recognition of the scanning pen is completed based on the recognition model, extracting and analyzing the context information to obtain a recognition result;
continuously collecting and processing sample data comprising language and character sets, and gradually expanding the application range of the recognition model;
the recognition model is optimized to improve recognition accuracy in special cases including complex typesetting, blurred or deformed text.
First, a large number of written and handwritten text samples are collected and preprocessed by means of image processing enhancement functions to generate a text dataset. Next, a recognition model is constructed using deep learning and neural network techniques, and a model-customized training tool is provided for the recognition model. After the recognition is performed by the stylus, the context information is extracted and analyzed to obtain a recognition result. Then, sample data including language and character sets are continuously collected and processed, and the application range of the recognition model is gradually expanded. And finally, optimizing the recognition model, and improving the recognition accuracy under the special conditions of complicated typesetting, blurring or deformed texts. From an implementation perspective, the off-line recognition method has the advantages of improving recognition accuracy, enhancing context analysis capability, providing model customization training, and continuing data collection and processing and model optimization.
Referring to fig. 2, a number of written and handwritten text samples are collected, preprocessed by an image processing enhancement function, and a text dataset is generated by the steps of:
collecting a large number of written and handwritten text image samples, covering different fonts, languages, character sets, colors, paper textures, as an initial dataset;
based on the initial data set, performing image processing technology including image distortion correction, image denoising and image enhancement, completing preprocessing work, and generating corresponding processed image samples;
labeling the processed image samples to record the positions of the text areas and the corresponding character label information, and integrating to generate a text data set.
The image distortion correction is specifically to correct deformation and distortion of an image by searching key points or edges in an image sample and applying a perspective transformation matrix, and to perform perspective transformation by using four corner points, and to adjust the positions of the corner points so that text lines are more linear and parallel;
image denoising is specifically to smooth an image by combining the spatial distance between pixels and the similarity between pixel values, and simultaneously, preserving edge information so as to reduce noise in an image sample;
the image enhancement is specifically that a histogram equalization method is adopted, and the histogram of the image is uniformly distributed by transforming the pixel value distribution of the image sample, so that the contrast and detail of the image sample are enhanced.
The steps of collecting a large number of written and handwritten text samples and preprocessing and generating a text dataset by means of image processing enhancement functions have multiple benefits. Firstly, by collecting diversified sample data and covering samples in different fonts, languages, character sets, colors, paper textures and the like, a rich and various text data set can be established, a wider training sample is provided, and the robustness and generalization capability of the recognition model are enhanced. Secondly, through image processing enhancement technologies such as image distortion correction, image denoising and image enhancement, the quality of a text image sample can be improved, deformation is corrected, noise is reduced, contrast and details are enhanced, and therefore the sharpness and the recognizability of a text can be improved. And finally, accurately labeling the processed image sample, recording the position of the text region and character label information, providing accurate reference for model learning and inference, and providing an accurate text data set.
Referring to fig. 3, the steps for constructing the recognition model by adopting the deep learning and neural network technology are specifically as follows:
dividing the text data set into a training set, a verification set and a test set according to the proportion of 14:3:3;
Extracting image features by adopting an image feature extractor, and converting a training set, a verification set and a test set into corresponding digital features;
establishing an identification model, and designing a CNN architecture with a plurality of convolution layers and pooling layers for extracting characteristics of a text image;
selecting a cross entropy loss function, and calculating the gradient of the loss function to the identification model parameters by using a back propagation algorithm;
updating the recognition model parameters to minimize the loss function by using an optimization algorithm, particularly a random gradient descent, and limiting the complexity of the recognition model by using a regularization technology;
super parameters are selected through a cross validation method, and comprise learning rate, batch size and hidden unit number.
First, the text data set is divided into a training set, a validation set and a test set in a ratio of 14:3:3 for model training, validation and evaluation. The image data is converted to digital features by an image feature extractor in preparation for model input. And secondly, a Convolutional Neural Network (CNN) architecture with a plurality of convolutional layers and pooling layers is adopted for establishing an identification model, so that the characteristics of a text image can be effectively extracted, and the accuracy and the robustness of offline identification are improved. The difference between the model output and the label is measured using a cross entropy loss function and the gradient of the model parameters is calculated using a back propagation algorithm. Model parameters are updated through optimization algorithms such as random gradient descent and the like to minimize a loss function, and regularization technology is adopted to limit complexity of the model so as to prevent overfitting. And selecting proper super parameters such as learning rate, batch size, hidden unit number and the like through a cross-validation method, and optimizing the performance and stability of the model. In summary, the accuracy, robustness and generalization capability of offline recognition can be improved by constructing the recognition model by adopting deep learning and neural network technology, so that the model can be better adapted to different text image recognition tasks.
Referring to fig. 4, the steps for providing the model customization training tool are specifically as follows:
selecting PyTorch as a deep learning framework based on the recognition model, and creating a basic structure of a training framework;
in an infrastructure, providing data loading functions, including reading and preprocessing a text dataset;
in the basic structure, a model building module is created, a user is allowed to define a self-defined network structure which comprises a convolution layer, a circulation layer and a full connection layer, and parameter initialization is carried out;
adding training and evaluating core code logic in the infrastructure, including optimization flows of forward propagation, loss calculation, back propagation and parameter update;
in the infrastructure, a user interface is provided that allows a user to specify and adjust the super parameters of the model, and command line parameters are provided for defining and modifying components in the recognition model, integrated as a training framework;
in the training framework, a visual function is provided to monitor the index and curve change in the training process, and monitor various index and curve changes in the training process.
First, pyTorch is selected as a deep learning framework, and an infrastructure of the training framework is created, providing a unified infrastructure and code structure for subsequent training tools. And secondly, the data loading and preprocessing functions are provided, so that a user can conveniently load and prepare a text data set, and the usability and effect of training data are improved. In the basic structure, the user is allowed to define a customized network structure comprising a convolution layer, a circulation layer and a full connection layer, and parameter initialization is carried out, so that the customization requirement of the user on the network structure is met. At the same time, core code logic for training and evaluation is provided, including optimization flows of forward propagation, loss calculation, backward propagation and parameter updating, and training work of users is simplified. Providing an interface for the user, allowing for adjustment of the hyper-parameters of the model, and providing command line parameters for modifying the model components, increases the flexibility of training. Finally, a visual function is provided, index and curve changes in the training process are monitored, and a user can conveniently debug and optimize the model. In summary, the model customization training tool provided can improve training efficiency, flexibility and result quality, so that a user can customize and optimize own recognition models.
Referring to fig. 5, after the recognition of the scanning pen based on the recognition model is completed, the steps of extracting and analyzing the context information and obtaining the recognition result are specifically as follows:
the scanning pen acquires image data, the image data is input into the recognition model, and a recognized text result is acquired;
dividing a text result based on layout information comprising text lines and paragraphs to generate paragraphs or text lines, applying a character division algorithm to the handwritten text result, separating characters from the text result, integrating the above contents, and obtaining context information;
based on the context information and a language model, identifying the language to which the text result belongs;
and performing context analysis based on the context information, and improving the recognition result.
Context analysis includes semantic analysis, named entity recognition, and context error correction;
the semantic analysis specifically comprises the steps of performing semantic analysis on context information of a text result by using a natural language processing technology comprising part-of-speech tagging and dependency syntactic analysis, and identifying grammar structure and semantic role information of the context information;
the named entity identification is specifically that the named entity including the name of the person, the name of the place and the organization is identified in the text result;
the context correction is specifically to correct possible recognition errors in the text result through language model or context similarity detection based on the context information;
The improved recognition result comprises context correction and word splicing;
the context correction is specifically to correct an error or uncertainty part in a text result by using context information;
word concatenation is specifically to recombine segmented characters into words for a handwritten word result.
Firstly, inputting image data acquired by a scanning pen into a recognition model, and acquiring a character result through image recognition. Then, the text result is divided into paragraphs or text lines according to the layout information, and a character segmentation algorithm is applied to separate characters from the handwritten text result, so that the text result with more context information is obtained. And then, based on the context information and in combination with a language model, identifying the language to which the text result belongs, and realizing language identification. And semantic analysis, named entity recognition and context error correction are performed by using context analysis, so that understanding and accuracy of a text result are improved. Identifying grammar structures and semantic role information through semantic analysis; identifying name entities such as person names, place names, institutions and the like through name entity identification; by means of context correction, possible recognition errors are corrected. Finally, the context correction and word splicing are utilized, errors or uncertainties are corrected according to the context information, segmented characters in the handwriting result are recombined, word splicing is completed, and accuracy and readability of the result are improved. In summary, extracting and analyzing the context information helps to improve quality and consistency of the recognition result, and provides a more accurate and complete text recognition result, thereby enhancing practicality and efficiency of the stylus.
Referring to fig. 6, the steps of continuously collecting and processing sample data including language and character sets and gradually expanding the application range of the recognition model are specifically as follows:
determining the required language and character set range, and preparing corresponding supplementary text sample data;
preprocessing the supplementary text sample data through an image processing enhancement function to generate a supplementary text data set;
based on the supplementary text data set, the adaptation and expansion work of the recognition model are carried out, and the application range is enlarged;
the adaptation and expansion work comprises initializing model weight, transferring learning, model training and data enhancement;
the initialization model weight is specifically that on the basis of the identification model, a pre-trained weight or a random initialization weight is used;
the transfer learning is specifically to adapt to a new language and character set by fine tuning by taking the existing pre-training model as a basic model;
the model training is specifically to train and optimize the recognition model by adopting a cyclic neural network based on the supplementary text data set;
the data enhancement adopts an image enhancement technology and a text enhancement technology;
image enhancement techniques include rotation, translation, scaling, text enhancement techniques include replacing characters, inserting noise.
First, the required language and character set range is determined and corresponding supplementary text sample data is prepared. The supplemental text sample data is then preprocessed by the image processing enhancement function to generate a supplemental text data set. On the basis, the adaptation and expansion work of the recognition model is carried out, including the initialization of model weight, the migration learning, the model training and the data enhancement. By initializing model weights, pre-trained weights or randomly initializing weights are used to prepare for adapting to new languages and character sets. The existing pre-trained model is used as a basic model by transfer learning, and new languages and character sets are adapted by fine tuning. Training and optimization is performed using a recurrent neural network based on the supplemental text data set through model training. Meanwhile, the image enhancement technology and the text enhancement technology are adopted to enhance data, so that the robustness and generalization capability of the model are improved. From the implementation point of view, the application range of the recognition model can be gradually enlarged by continuously collecting and processing sample data, so that the recognition capability of the model on different languages and character sets can be enhanced, the application breadth and flexibility of the model are improved, and diversified user requirements are met. The method can provide more accurate and reliable recognition results for related applications such as the scanning pen, and therefore user experience and work efficiency are improved.
Referring to fig. 7, the steps for optimizing the recognition model to improve the recognition accuracy in the special case of text including complex typesetting, blurring or morphing are specifically:
the receptive field and the feature extraction capability of the model are improved by adding a deeper convolution layer in the identification model and adjusting the size and the number of convolution kernels;
the attention introducing mechanism is used for enhancing the attention of the recognition model to the important text area, and the attention weight is utilized to enhance the recognition of the complex typesetting or deformed text;
an end-to-end training strategy is adopted, and the capability of the recognition model for resisting special interference is improved;
input images with different sizes are used for capturing features with different levels of detail, so that the recognition accuracy of the recognition model on complex typesetting and deformed text is improved;
in the recognition model, auxiliary tasks including character position detection and text line segmentation are introduced, and context information and auxiliary constraint are provided, so that the overall recognition effect is improved.
Firstly, a deep convolution layer is added, the size and the number of convolution kernels are adjusted, the receptive field and the feature extraction capability of the model can be improved, and the capturing of key features in complex typesetting, blurring or deformation texts is facilitated. Secondly, attention mechanisms are introduced, attention of the model to important text areas can be enhanced, and recognition of complex typesetting or deformed texts is enhanced by using attention weights. The end-to-end training strategy is adopted, so that the robustness of the model to special interference can be improved, and the mapping from the image to the text can be directly learned, so that the method is suitable for various special situations. In addition, the input images with different sizes can capture the characteristics of different levels of detail, so that the recognition accuracy of complex typesetting and deformed text is improved. Auxiliary tasks such as character position detection and text line segmentation can also be introduced to provide context information and auxiliary constraints to improve overall recognition. Through these steps, the recognition model can be more accurately recognized in the face of complicated typesetting, blurring or deformed text, providing reliable and efficient services. This will enhance the usability and user experience of applications such as swipe pens, coping with various text recognition challenges.
Referring to fig. 8, the offline recognition system for the scanning pen is composed of a data collection module, a model construction module, a text recognition module and a model optimization module;
the data collection module comprises a text collection sub-module, an image processing sub-module and a labeling sub-module;
the model construction module comprises a data set dividing sub-module, a feature extraction sub-module, a model design sub-module, a model training sub-module and a super parameter selection sub-module;
the text recognition module comprises a text recognition sub-module, a context extraction sub-module and a context analysis sub-module;
the model optimization module comprises a data expansion sub-module, an adaptation expansion sub-module and an identification model optimization sub-module.
The offline recognition system for the scanning pen consists of a data collection module, a model construction module, a character recognition module and a model optimization module. The data collection module comprises a text collection sub-module, an image processing sub-module and a labeling sub-module, and can provide stable text sample data, high-quality image processing and accurate labeling. The model construction module comprises a data set dividing sub-module, a feature extraction sub-module, a model design sub-module, a model training sub-module and a super parameter selection sub-module, and provides reasonable data set division, effective feature extraction, optimized model structure and optimal super parameter setting for establishing an identification model. The text recognition module comprises a text recognition sub-module, a context extraction sub-module and a context analysis sub-module, and realizes accurate text recognition, rich context information extraction and high-quality recognition result analysis. The model optimization module comprises a data expansion sub-module, an adaptation expansion sub-module and a recognition model optimization sub-module, and optimizes and improves the recognition model by increasing diversity of a data set, adapting expansion requirements, adjusting model parameters and the like.
Working principle: data collection and preprocessing: a large number of written and handwritten text samples are collected to obtain a rich data set. These samples should contain different fonts, languages, character sets, colors and paper textures. The collected text image samples are preprocessed using an image processing enhancement function. This includes techniques such as image distortion correction, denoising, and image enhancement. Labeling the preprocessed image sample to record the position of the text region and corresponding character label information, and generating a text data set. The text data set is scaled into a training set, a validation set and a test set, typically using a ratio of 14:3:3. The image is converted to digital features using an image feature extractor for model training and recognition. An identification model is constructed, usually by a deep learning method, and a Convolutional Neural Network (CNN) architecture comprising a plurality of convolutional layers and pooled layers is designed for extracting features in a text image. The cross entropy loss function is used as an optimization objective for the model, and the gradient of the loss function to the model parameters is calculated using a back propagation algorithm. Model parameters are updated according to gradients using an optimization algorithm such as random gradient descent to minimize the loss function, and regularization techniques are used to control the complexity of the model. Super parameters such as learning rate, batch size and number of hidden units are selected by cross validation. Based on the constructed recognition model, pyTorch is selected as a deep learning framework, and a basic structure of a training framework is established. Data loading functions are added to the infrastructure, including reading and preprocessing the text data set. The model building module is created to allow a user to define a self-defined network structure comprising a convolution layer, a circulation layer and a full connection layer, and to initialize parameters. Core code logic for training and evaluation is added, including optimization flows of forward propagation, penalty calculation, back propagation, and parameter updating. A user interface is provided that allows a user to specify and adjust the hyper-parameters of the model and to use the command line parameters to define and modify the components of the recognition model. The visual function is provided, the index and curve change in the training process is monitored, and the change of various training indexes is monitored. The scanning pen inputs the acquired image data into the recognition model, performs character recognition, and outputs a recognized character result. For the handwritten text result, a character segmentation algorithm is used to separate the characters from the text result. Based on the layout information of the text region, the text result is divided into paragraphs or text lines to extract the context information. And carrying out context analysis by utilizing the context information and the language model, identifying the language to which the text result belongs, and carrying out error correction and improving the identification result. Sample data containing language and character sets are continuously collected and processed, and the application range of the recognition model is gradually expanded. Language and character set ranges that need adaptation and expansion are determined and corresponding supplemental text sample data is prepared. The supplemental text sample data is image processing enhanced and a supplemental text data set is generated. And executing adaptation and expansion work, including the steps of initializing model weight, transferring learning, model training, data enhancement and the like, so as to enlarge the adaptation range of the model and improve the recognition quality. The receptive field and the feature extraction capability of the model are improved by adding deeper convolution layers, adjusting the size and the number of convolution kernels and other technologies. Attention mechanisms are introduced to enhance the attention of recognition models to important text regions to address recognition challenges of complex typeset or morphed text. And an end-to-end training strategy is used, so that the capability of the recognition model for resisting special interference is improved. Input images with different sizes are used to capture features of different levels of detail, thereby improving the recognition accuracy of complex typesetting and deformed text. Auxiliary tasks such as character position detection, text line segmentation and the like are introduced, context information and auxiliary constraint are provided, and the overall recognition effect is improved.
The present invention is not limited to the above embodiments, and any equivalent embodiments which can be changed or modified by the technical disclosure described above can be applied to other fields, but any simple modification, equivalent changes and modification made to the above embodiments according to the technical matter of the present invention will still fall within the scope of the technical disclosure.

Claims (10)

1. The off-line identification method for the scanning pen is characterized by comprising the following steps of:
collecting a large number of written and handwritten text samples, and preprocessing the written and handwritten text samples through an image processing enhancement function to generate a text data set;
based on the text data set, constructing an identification model by adopting deep learning and neural network technology, and providing a model customized training tool for the identification model;
after the recognition of the scanning pen is completed based on the recognition model, extracting and analyzing the context information to obtain a recognition result;
continuously collecting and processing sample data comprising language and character sets, and gradually expanding the application range of the recognition model;
optimizing the recognition model improves recognition accuracy in special cases including complex typesetting, blurring or morphed text.
2. The method for offline recognition of a stylus according to claim 1, wherein the step of collecting a plurality of written and handwritten text samples, preprocessing by image processing enhancement, and generating a text dataset is specifically:
collecting a large number of written and handwritten text image samples, covering different fonts, languages, character sets, colors, paper textures, as an initial dataset;
based on the initial data set, performing image processing technologies including image distortion correction, image denoising and image enhancement, completing preprocessing work, and generating corresponding processed image samples;
and labeling the processed image sample to record the position of the text region and the corresponding character label information, and integrating to generate the text data set.
3. The offline recognition method for a scanning and reading pen according to claim 2, wherein the image distortion correction is specifically to correct deformation and distortion of an image by searching key points or edges in the image sample and applying a perspective transformation matrix, performing perspective transformation by using four corner points, and adjusting the positions of the corner points to make text lines more linear and parallel;
The image denoising is specifically to smooth an image by combining the space distance between pixels and the similarity between pixel values, and simultaneously, preserve edge information so as to reduce noise in the image sample;
the image enhancement is specifically that a histogram equalization method is adopted, and the histogram of the image is uniformly distributed by transforming the pixel value distribution of the image sample, so that the contrast and detail of the image sample are enhanced.
4. The offline recognition method for the scanning pen according to claim 1, wherein the step of constructing the recognition model by adopting the deep learning and neural network technology specifically comprises the following steps:
dividing the text data set into a training set, a verification set and a test set according to the proportion of 14:3:3;
extracting image features by adopting an image feature extractor, and converting the training set, the verification set and the test set into corresponding digital features;
establishing an identification model, and designing a CNN architecture with a plurality of convolution layers and pooling layers for extracting characteristics of a text image;
selecting a cross entropy loss function, and calculating the gradient of the loss function to the identification model parameters by using a back propagation algorithm;
updating the recognition model parameters to minimize a loss function using an optimization algorithm, in particular a random gradient descent, limiting the complexity of the recognition model using regularization techniques;
And selecting super parameters by a cross-validation method, wherein the super parameters comprise learning rate, batch size and hidden unit number.
5. The method for offline identification of a stylus according to claim 1, wherein the step of providing a model-customized training tool is specifically:
selecting PyTorch as a deep learning framework based on the recognition model, and creating a basic structure of a training framework;
in the infrastructure, providing data loading functions including reading and preprocessing a text dataset;
in the basic structure, a model building module is created, a user is allowed to define a self-defined network structure which comprises a convolution layer, a circulation layer and a full connection layer, and parameter initialization is carried out;
adding training and evaluating core code logic in the infrastructure, including optimization flows of forward propagation, loss calculation, back propagation and parameter update;
in the infrastructure, providing a user interface allowing a user to specify and adjust super parameters of a model, providing command line parameters for defining and modifying components in the recognition model, and integrating as a training framework;
in the training framework, a visual function is provided for monitoring the index and curve change in the training process, and various index and curve changes in the training process are monitored.
6. The offline recognition method for a stylus according to claim 1, wherein the stylus extracts and analyzes context information after recognition based on the recognition model, and the step of obtaining the recognition result specifically comprises:
the scanning pen acquires image data, the image data is input into the recognition model, and a recognized text result is acquired;
dividing the text result based on layout information comprising text lines and paragraphs to generate paragraphs or text lines, applying a character division algorithm to the handwritten text result, separating characters from the text result, integrating the above contents, and obtaining context information;
based on the context information and a language model, identifying the language to which the text result belongs;
and performing context analysis based on the context information and improving the recognition result.
7. The offline recognition method for a swipe pen of claim 6, wherein the contextual analysis includes semantic analysis, named entity recognition, contextual error correction;
the semantic analysis specifically comprises the steps of performing semantic analysis on the context information of a text result by using a natural language processing technology comprising part-of-speech tagging and dependency syntactic analysis, and identifying a grammar structure and semantic role information of the context information;
The named entity identification is specifically to identify named entities including person names, place names and institutions in the text results;
the context error correction is specifically to correct possible recognition errors in the text result through a language model or context similarity detection based on the context information;
the improved recognition result comprises context correction and word splicing;
the context correction is specifically to correct an error or uncertainty part in the text result by using context information;
the word stitching is specifically to recombine the segmented characters into words for the handwritten text result.
8. The offline recognition method for a stylus according to claim 1, wherein the step of continuously collecting and processing sample data including language and character sets and gradually expanding the application range of the recognition model comprises the steps of:
determining the required language and character set range, and preparing corresponding supplementary text sample data;
preprocessing the supplementary text sample data through the image processing enhancement function to generate a supplementary text data set;
based on the supplementary text data set, carrying out adaptation and expansion work of the recognition model, and expanding the application range;
The adaptation and expansion work comprises initializing model weight, transferring learning, model training and data enhancement;
the initialization model weight is specifically that a pre-trained weight or a random initialization weight is used on the basis of the identification model;
the transfer learning is specifically that an existing pre-training model is used as a basic model, and a new language and character set are adapted through fine adjustment;
the model training is specifically to train and optimize the recognition model by adopting a cyclic neural network based on the supplementary text data set;
the data enhancement adopts an image enhancement technology and a text enhancement technology;
the image enhancement techniques include rotation, translation, scaling, and the text enhancement techniques include replacing characters, inserting noise.
9. The offline recognition method for a stylus according to claim 1, wherein the optimizing the recognition model to improve recognition accuracy in special cases including complex typesetting, blurred or deformed text is specifically as follows:
the receptive field and the feature extraction capability of the model are improved by adding deeper convolution layers in the identification model and adjusting the size and the number of convolution kernels;
The attention introducing mechanism is used for enhancing the attention of the recognition model to the important text area, and the attention weight is utilized to enhance the recognition of the complex typesetting or deformed text;
an end-to-end training strategy is adopted, and the capability of the recognition model for resisting special interference is improved;
using input images with different sizes to capture features with different levels of detail, thereby improving the recognition accuracy of the recognition model on complex typesetting and deformed text;
and introducing auxiliary tasks including character position detection and text line segmentation into the recognition model, providing context information and auxiliary constraint, and improving the overall recognition effect.
10. The offline recognition system for the scanning pen is characterized by comprising a data collection module, a model construction module, a character recognition module and a model optimization module;
the data collection module comprises a text collection sub-module, an image processing sub-module and a labeling sub-module;
the model building module comprises a data set dividing sub-module, a feature extraction sub-module, a model design sub-module, a model training sub-module and a super parameter selection sub-module;
the text recognition module comprises a text recognition sub-module, a context extraction sub-module and a context analysis sub-module;
The model optimization module comprises a data expansion sub-module, an adaptation expansion sub-module and an identification model optimization sub-module.
CN202311005379.0A 2023-08-10 2023-08-10 Off-line identification method and system for scanning pen Pending CN117152768A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311005379.0A CN117152768A (en) 2023-08-10 2023-08-10 Off-line identification method and system for scanning pen

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311005379.0A CN117152768A (en) 2023-08-10 2023-08-10 Off-line identification method and system for scanning pen

Publications (1)

Publication Number Publication Date
CN117152768A true CN117152768A (en) 2023-12-01

Family

ID=88883349

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311005379.0A Pending CN117152768A (en) 2023-08-10 2023-08-10 Off-line identification method and system for scanning pen

Country Status (1)

Country Link
CN (1) CN117152768A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117472257A (en) * 2023-12-28 2024-01-30 广东德远科技股份有限公司 Automatic regular script turning method and system based on AI algorithm

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117472257A (en) * 2023-12-28 2024-01-30 广东德远科技股份有限公司 Automatic regular script turning method and system based on AI algorithm
CN117472257B (en) * 2023-12-28 2024-04-26 广东德远科技股份有限公司 Automatic regular script turning method and system based on AI algorithm

Similar Documents

Publication Publication Date Title
CN111723585B (en) Style-controllable image text real-time translation and conversion method
CN111325203B (en) American license plate recognition method and system based on image correction
RU2661750C1 (en) Symbols recognition with the use of artificial intelligence
CN112052852B (en) Character recognition method of handwriting meteorological archive data based on deep learning
EP1854051A2 (en) Intelligent importation of information from foreign application user interface using artificial intelligence
CN113762269B (en) Chinese character OCR recognition method, system and medium based on neural network
CN114596566B (en) Text recognition method and related device
CN111523622B (en) Method for simulating handwriting by mechanical arm based on characteristic image self-learning
CN112819686A (en) Image style processing method and device based on artificial intelligence and electronic equipment
Jain et al. Unconstrained OCR for Urdu using deep CNN-RNN hybrid networks
CN113449787B (en) Chinese character stroke structure-based font library completion method and system
CN117152768A (en) Off-line identification method and system for scanning pen
CN114821610B (en) Method for generating webpage code from image based on tree-shaped neural network
CN114092938A (en) Image recognition processing method and device, electronic equipment and storage medium
CN115311666A (en) Image-text recognition method and device, computer equipment and storage medium
Ueki et al. Survey on deep learning-based Kuzushiji recognition
WO2021137942A1 (en) Pattern generation
CN112131834A (en) West wave font generation and identification method
Gurmu Offline handwritten text recognition of historical Ge’ez manuscripts using deep learning techniques
CN116311275B (en) Text recognition method and system based on seq2seq language model
Reul An Intelligent Semi-Automatic Workflow for Optical Character Recognition of Historical Printings
Xie et al. Enhancing multimodal deep representation learning by fixed model reuse
CN117593755B (en) Method and system for recognizing gold text image based on skeleton model pre-training
CN117472257B (en) Automatic regular script turning method and system based on AI algorithm
Dharsini et al. Devanagri character image recognition and conversion into text using long short term memory

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination