CN116704519A - Character recognition method, character recognition device, electronic equipment and storage medium - Google Patents

Character recognition method, character recognition device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116704519A
CN116704519A CN202310673162.0A CN202310673162A CN116704519A CN 116704519 A CN116704519 A CN 116704519A CN 202310673162 A CN202310673162 A CN 202310673162A CN 116704519 A CN116704519 A CN 116704519A
Authority
CN
China
Prior art keywords
layer
character recognition
recognition model
character
probability matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310673162.0A
Other languages
Chinese (zh)
Inventor
崔雪峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Haoxueduo Intelligent Technology Co ltd
Original Assignee
Shenzhen Rubu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Rubu Technology Co ltd filed Critical Shenzhen Rubu Technology Co ltd
Priority to CN202310673162.0A priority Critical patent/CN116704519A/en
Publication of CN116704519A publication Critical patent/CN116704519A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19173Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • G06V30/1801Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes or intersections
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Character Discrimination (AREA)

Abstract

The invention discloses a character recognition method, a character recognition device, electronic equipment and a storage medium. The character recognition method comprises the following steps: the method comprises the steps of obtaining a character recognition model, wherein the character recognition model comprises a convolution layer, a circulation layer and a transcription layer, the convolution layer is formed by stacking depth-separable convolution blocks, and the convolution layer is used for extracting a characteristic sequence of an input image; the circulating layer is used for determining a predicted result of the input image according to the characteristic sequence and calculating the deviation between the predicted result and the true value to obtain a probability matrix; the transcription layer is used for outputting the recognition result of the input image according to the probability matrix; and inputting the image containing the characters to be identified into the character identification model to obtain a character identification result. According to the technical scheme, the probability matrix is calculated by using the light text recognition model, and the recognition result is output according to the probability matrix, so that the real-time performance of reasoning is improved, and the text recognition efficiency is improved.

Description

Character recognition method, character recognition device, electronic equipment and storage medium
Technical Field
The embodiment of the invention relates to the field of image processing, in particular to a character recognition method, a character recognition device, electronic equipment and a storage medium.
Background
Conventional optical character recognition (Optical Character Recognition, OCR) techniques often require multiple processing of the picture, such as denoising, binarization, character segmentation, etc., which are time consuming and susceptible to factors such as illumination, noise, etc., thereby affecting recognition accuracy. With the continuous enhancement of the computing power of the mobile terminal chip, it is possible to deploy some relatively complex deep learning models with higher computing power requirements on the mobile terminal, and the requirements for performing deep learning reasoning on mobile equipment are also higher and higher.
The model adopted by the real-time word recognition is a model with large parameter quantity and large calculation quantity, usually contains hundreds of millions of parameter quantity, the model is more different in size than tens or hundreds of megabytes, larger memory resources and calculation resources are occupied in the model reasoning process, the model is suitable for cloud service reasoning containing a graphic processor (Graphics Processing Unit, GPU), because the model has stronger calculation support, the mobile terminal with limited calculation resources has the risk of long response time or system breakdown, the reasoning speed is low, and the word recognition efficiency is low, so that the requirement of the mobile terminal on deploying real-time response is difficult to meet.
Disclosure of Invention
The invention provides a character recognition method, a character recognition device, electronic equipment and a storage medium, which are used for improving the real-time performance of reasoning and improving the character recognition efficiency.
In a first aspect, an embodiment of the present invention provides a text recognition method, including:
the method comprises the steps of obtaining a character recognition model, wherein the character recognition model comprises a convolution layer, a circulation layer and a transcription layer, the convolution layer is formed by stacking depth-separable convolution blocks, and the convolution layer is used for extracting a characteristic sequence of an input image; the circulating layer is used for determining a predicted result of the input image according to the characteristic sequence and calculating the deviation between the predicted result and the true value to obtain a probability matrix; the transcription layer is used for outputting the recognition result of the input image according to the probability matrix;
and inputting the image containing the characters to be identified into the character identification model to obtain a character identification result.
Optionally, each of the depth separable convolution blocks includes a shorting connection therein;
if the depth separable convolution block is located in the downsampling layer, the short circuit connection is a convolution of 1×1;
otherwise, the short circuit connection is a characteristic diagram input by the upper layer.
Optionally, the text recognition model is used for taking, for each slice, a character corresponding to a maximum probability according to a probability matrix corresponding to the slice as a recognition result corresponding to the slice based on a greedy search algorithm.
Optionally, inputting an image containing the text to be recognized into the text recognition model includes:
detecting images containing characters to be recognized segment by segment, and inputting the currently detected images with the set length into the character recognition model every time the images with the set length are detected.
Optionally, the text recognition model is further used for: and if the text recognition model outputs images with different set lengths and comprises a superposition area, weighting and averaging the corresponding positions in the corresponding probability matrix of the superposition area to obtain the probability matrix of the superposition area.
Optionally, acquiring the text recognition model includes:
constructing a character recognition model and training the character recognition model;
converting the character recognition model into an open standard file format and merging partial layers through a simplifying tool;
the recognition model is converted to an inference framework format using a conversion tool.
Optionally, the method further comprises: and scaling the image containing the text to be recognized to a preset size according to the principle of equal proportion.
In a second aspect, an embodiment of the present invention further provides a text recognition device, including:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a character recognition model, the character recognition model comprises a convolution layer, a circulation layer and a transcription layer, the convolution layer is formed by stacking depth separable convolution blocks, and the convolution layer is used for extracting a characteristic sequence of an input image; the circulating layer is used for determining a predicted result of the input image according to the characteristic sequence and calculating the deviation between the predicted result and the true value to obtain a probability matrix; the transcription layer is used for outputting the recognition result of the input image according to the probability matrix;
and the recognition module is used for inputting the image containing the characters to be recognized into the character recognition model to obtain a character recognition result.
In a third aspect, an embodiment of the present invention provides an electronic device, including:
one or more processors;
a storage means for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the word recognition method as described in the first aspect.
In a fourth aspect, an embodiment of the present invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the word recognition method according to the first aspect.
The embodiment of the invention provides a character recognition method, a character recognition device, electronic equipment and a storage medium. The character recognition method comprises the following steps: the method comprises the steps of obtaining a character recognition model, wherein the character recognition model comprises a convolution layer, a circulation layer and a transcription layer, the convolution layer is formed by stacking depth-separable convolution blocks, and the convolution layer is used for extracting a characteristic sequence of an input image; the circulating layer is used for determining a predicted result of the input image according to the characteristic sequence and calculating the deviation between the predicted result and the true value to obtain a probability matrix; the transcription layer is used for outputting the recognition result of the input image according to the probability matrix; and inputting the image containing the characters to be identified into the character identification model to obtain a character identification result. According to the technical scheme, the probability matrix is calculated by using the light text recognition model, and the recognition result is output according to the probability matrix, so that the real-time performance of reasoning is improved, and the text recognition efficiency is improved.
Drawings
The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.
FIG. 1 is a flowchart of a text recognition method according to a first embodiment of the present invention;
FIG. 2 is a schematic diagram of a shorting connection according to one embodiment;
FIG. 3 is a flowchart of a text recognition method according to a second embodiment of the present invention;
FIG. 4 is a schematic diagram of an embodiment of an output including overlapping regions;
FIG. 5 is a schematic diagram of a probability matrix for determining a coincidence region according to an embodiment;
fig. 6 is a schematic structural diagram of a text recognition device according to a third embodiment of the present invention;
fig. 7 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. Furthermore, embodiments of the invention and features of the embodiments may be combined with each other without conflict. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.
Before discussing exemplary embodiments in more detail, it should be mentioned that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart depicts steps as a sequential process, many of the steps may be implemented in parallel, concurrently, or with other steps. Furthermore, the order of the steps may be rearranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figures. The processes may correspond to methods, functions, procedures, subroutines, and the like.
Example 1
Fig. 1 is a flowchart of a text recognition method according to a first embodiment of the present invention, where the present embodiment is applicable to text recognition, for example, to a mobile device such as a scanning pen that can automatically collect data. Specifically, the word recognition method may be performed by a word recognition device, which may be implemented by software and/or hardware, and integrated in an electronic device. Further, the electronic device may refer to a desktop computer, a server, a notebook computer, a scanning device, or the like.
As shown in fig. 1, the method specifically includes the following steps:
s110, acquiring a character recognition model.
The character recognition model mainly refers to a pre-trained deep learning model, and comprises a convolution layer, a circulation layer and a transcription layer, wherein the convolution layer is formed by stacking depth separable convolution blocks and is used for extracting a characteristic sequence of an input image; the circulating layer is used for determining a predicted result of the input image according to the feature sequence and calculating the deviation between the predicted result and the true value to obtain a probability matrix; and the transcription layer is used for outputting the recognition result of the input image according to the probability matrix.
In this embodiment, a light-weight character recognition model is constructed using depth separable convolution. The depth separable convolution is a lightweight convolution operation, can reduce the number of model parameters and the calculated amount, mainly comprises channel-by-channel convolution (Depthwise Convolution) and point-by-point convolution (Pointwise Convolution), can remarkably reduce the use of calculation and storage resources while maintaining the performance of a model, and can be suitable for mobile equipment with limited calculation resources and storage resources.
The Backbone network (Backbone) of the character recognition model may be formed by stacking depth separable convolution blocks (blocks) to achieve the purpose of feature extraction. For an input image with the dimension of (100,32,3) of input width and height channels, the width and height of the image are respectively 1/8 and 1/32 of the original width and height of the image through convolution layer downsampling, and the obtained feature images are subjected to serialization operation to obtain a feature sequence (w/8, batch, 512), wherein w represents the width of the original image, batch represents the number of Batch processing, and the value is 512. The obtained feature sequence is sent to a circulation layer, and finally, a result of (w/4, batch, class_num) is output, the result corresponds to an image recognition result and a true value through a Loss function (for example, connectionist Temporal Classification Loss, CTC Loss for short), the difference between the prediction result and the true value is calculated by using a forward and backward deduction mode, and the obtained Loss (Loss) can be further used for the parameters of a back propagation training model.
In this embodiment, the text recognition model may be implemented based on a TNN inference framework. TNN is a lightweight cross-platform deep learning reasoning framework, a highly optimized calculation engine is adopted, efficient model reasoning can be realized, a plurality of hardware platforms such as a central processing unit (Central Processing Unit, CPU), a graphic processor (Graphics Processing Unit, GPU) and an artificial intelligence (Artificial Intelligence, AI) accelerator are supported, and the memory occupation and the calculation resource consumption are low; in addition, TNN supports various deep learning frameworks and model formats, and can rapidly perform model deployment and service development and management; the method has smaller code volume and binary size, and is suitable for being used in environments with limited resources such as mobile equipment, embedded systems and the like; and support multiple operating systems and programming languages, model reasoning can be performed under different hardware platforms and system environments.
S120, inputting an image containing the characters to be recognized into the character recognition model to obtain a character recognition result.
Specifically, the character recognition model has the capability of reliably recognizing characters after training and testing. On the basis, an image containing the characters to be recognized is input into the character recognition model, and a character recognition result is obtained.
Optionally, the method further comprises: and scaling the image containing the text to be recognized to a preset size according to the principle of equal proportion. Illustratively, the detected image may be stretched in terms of an equal ratio of width to height, scaling the image to a (32, w, 3) size.
In practical applications, the word recognition model may be deployed in hardware with scanning functionality. The final function of the recognition can be completed by a plurality of models, wherein the detection model can be related, the detection model can be built by a similar backbox, and the inference speed of the model can be effectively accelerated by matching with three light-weight output heads.
According to the word recognition method in the embodiment, the speed of real-time reasoning can be improved from the viewpoint of model optimization, and the reasoning speed is further optimized by adopting a reasoning framework applicable to a mobile terminal, so that the purpose of real-time reasoning can be achieved. The conventional OCR technology often needs to perform multiple processes on the picture, such as denoising, binarization, character segmentation, and the like, which are time-consuming and are easily affected by factors such as illumination, noise, and the like, so as to affect recognition accuracy. The character recognition model in the embodiment can automatically learn the characteristics of the image by utilizing the deep learning principle and directly output the recognition result, so that a plurality of steps in the traditional OCR processing are avoided, the recognition precision and the recognition speed are higher, the character recognition efficiency and accuracy are improved, and the requirement of real-time response of mobile terminal deployment can be met. In addition, the real-time character recognition technology based on the depth vision can also support functions of real-time recognition, multi-language recognition and the like, and provides better use experience for users.
Optionally, each depth separable convolution block includes a shorting connection therein; if the depth separable convolution block is located in the downsampling layer, the short circuit connection is convolution of 1×1; otherwise, the short circuit connection is a characteristic diagram input by the upper layer.
Specifically, each Block not only includes a depth separable convolution module, but also includes a short circuit connection, and the manner of the short circuit connection can be determined according to whether the Block is a downsampling layer or not. Fig. 2 is a schematic diagram illustrating a short circuit connection according to an embodiment. As shown on the right side of fig. 2, when the layer where Block is located is a downsampled layer, the short circuit connection is a 1×1 (also denoted as 1*1) convolution; as shown on the left side of fig. 2, the shorting connection is a signature of the upper layer input. The short circuit connection branches are added into each Block, so that the problems of gradient elimination and gradient explosion possibly caused by a deep network can be solved, and the problem of insufficient deep separable convolution extraction characteristics is solved.
Optionally, the text recognition model is used for taking, for each slice, a character corresponding to a maximum probability according to a probability matrix corresponding to the slice as a recognition result corresponding to the slice based on a greedy search algorithm. In this embodiment, the output of the prediction in the recognition process of the word recognition model is a probability matrix, unlike the training process, this process may use a greedy search method to obtain the prediction result with the largest probability of each Slice (Slice).
Example two
Fig. 3 is a flowchart of a text recognition method according to a second embodiment of the present invention, where the text recognition process is embodied based on the above embodiment. It should be noted that technical details not described in detail in this embodiment may be found in any of the above embodiments.
Specifically, as shown in fig. 3, the method specifically includes the following steps:
s210, constructing a character recognition model and training the character recognition model.
S220, converting the character recognition model into an open standard file format and merging partial layers through a simplifying tool.
In this embodiment, a Python machine learning library (e.g., pythorch) may be used to export the trained text recognition model into an open standard file format (Open Neural Network Exchange, ONNX); in addition, some layers can be combined by using a simplified tool, and it can be understood that redundant layers are removed, and in particular, a character recognition model can be built according to a model structure similar to CRNN, and the redundant layers are for example, intermediate convolution layers of 3*3, and the layers can increase the parameter quantity of the model to a certain extent and influence the reasoning speed of the model. Through testing, the removal of these layers does not significantly reduce the accuracy of the model.
S230, converting the identification model into an inference framework format by using a conversion tool.
Specifically, the converted word recognition model in ONNX format can be converted into an inference framework (TNN) format by using a TNN conversion tool, which is helpful for the inference framework to self-load the model and do further quantization work.
S240, detecting images containing the characters to be identified segment by segment.
S250, is an image of a set length detected? If yes, execution S260 is performed, otherwise execution S240 is returned.
S260, inputting the currently detected image with the set length into the character recognition model to obtain a corresponding character recognition result.
S270, is the text recognition model configured to output images of different set lengths including overlapping regions? If yes, S280 is performed, otherwise S240 is performed back.
S280, weighting and averaging the corresponding positions in the corresponding probability matrix of the overlapping region to obtain the probability matrix of the overlapping region.
In this embodiment, considering the real-time display of the recognition result, the whole process may be performed in a real-time stitching manner, and the images input to the text recognition model are detected segment by segment, and each time the images are stitched to a certain length, the text recognition model is sent to perform recognition, where the length of a segment of the images sent to the text recognition model each time may be the same or different. On this basis, the process shots of detection and recognition may be synchronized. Moreover, the output of the text recognition model obtained for each input image may have a superposition area. For the overlapping region, the corresponding positions in the corresponding probability matrix can be weighted and averaged to obtain the final probability matrix of the overlapping region.
S290, obtaining the recognition result of the overlapping region according to the probability matrix.
Specifically, according to the final probability matrix, the character corresponding to the maximum probability can be used as the recognition result corresponding to the slice
FIG. 4 is a schematic diagram of an embodiment of an output including overlapping regions. Since a character may be split into two parts, the image that is sent to the character recognition model may include the second half of the last sent image of the character recognition model, and after recognition by the character recognition model, there may be some character repetition between two adjacent results. In this embodiment, the redundant portion may be removed by merging strings. As shown in fig. 4, each small box of this graph represents the predictive probability matrix for each slice after each graph has been identified, which slice corresponds to an image of 32 pixels in height by 8 in width for the original input graph. Wherein the upper part is the prediction result of the previous image and the lower frame sequence is the prediction result of the next image, and the overlapping parts of the two adjacent images can be seen. This problem is solved by weighting and averaging the positions corresponding to the probability matrix of the overlapping region.
Fig. 5 is a schematic diagram of a probability matrix for determining a coincidence region according to an embodiment. As shown in fig. 5, the last Slice of img1 results in the center of img2, while the foremost Slice of img2 is the center of img1, which are most prone to errors, while the risk of errors is greatly reduced in the centers of each other. Therefore, the foremost Slice of img2 selects the prediction result of img1 corresponding to centering, and the last Slice result of img1 selects the prediction result of centering Slice of the lower part. In the actual splicing process, the KEYI is selected according to the splicing effect.
According to the character recognition method provided by the second embodiment of the invention, a light-weight frame is adopted to build a character recognition model for character recognition, so that up to 10000 characters (including Chinese and English) can be recognized. Through the conversion of the model and deployment of a mobile terminal reasoning framework TNN, the accurate character recognition task can be completed under the condition that the model precision is unchanged; the character recognition model can be deployed on mobile equipment with limited resources, does not need the assistance of a cloud server, can be used offline, can realize real-time recognition, can be matched with various low-computation-power development boards by combining the optimization of a mobile-end reasoning framework on a memory and an operator, can achieve the effect of outputting a result in real time even if a CPU with limited resources is used, and can complete the function of real-time display even if quantification is not used. In order to improve user experience, the mobile terminal equipment can adopt an image segmentation or blocking mode to conduct recognition reasoning, can timely feed back recognition results, and efficiently and accurately combine the recognition results, so that the accuracy of recognition splicing results is effectively improved. In addition, the character recognition model also supports functions such as real-time recognition, multi-language recognition and the like. According to the text recognition method, a text recognition model designed based on the Torch framework is adopted, a method of searching a path with maximum probability after transcription in a CTC Loss decoding process is not adopted for processing a predicted result as a recognition result, a greedy search method is adopted, and for each Slice probability matrix, only characters corresponding to the maximum probability are taken as the recognition result of the Slice, so that the search efficiency and the reasoning speed are effectively improved. In addition, the method can timely identify the images with the indefinite length and complete the result splicing, and can timely feed back the images to the display screen, so that the model reasoning error rate can be reduced, and the user experience can be improved.
Example III
Fig. 6 is a schematic structural diagram of a text recognition device according to a third embodiment of the present invention. The text recognition device provided in this embodiment includes:
an obtaining module 310, configured to obtain a text recognition model, where the text recognition model includes a convolution layer, a circulation layer, and a transcription layer, where the convolution layer is formed by stacking depth-separable convolution blocks, and the convolution layer is configured to extract a feature sequence of an input image; the circulating layer is used for determining a predicted result of the input image according to the characteristic sequence and calculating the deviation between the predicted result and the true value to obtain a probability matrix; the transcription layer is used for outputting the recognition result of the input image according to the probability matrix;
the recognition module 320 is configured to input an image containing the text to be recognized into the text recognition model, so as to obtain a text recognition result.
The third embodiment of the invention provides a character recognition device, wherein a character recognition model is obtained through an obtaining module, the character recognition model comprises a convolution layer, a circulation layer and a transcription layer, the convolution layer is formed by stacking depth-separable convolution blocks, and the convolution layer is used for extracting a characteristic sequence of an input image; the circulating layer is used for determining a predicted result of the input image according to the characteristic sequence and calculating the deviation between the predicted result and the true value to obtain a probability matrix; the transcription layer is used for outputting the recognition result of the input image according to the probability matrix; and inputting an image containing the characters to be identified into the character identification model through the identification module to obtain a character identification result. On the basis, a light character recognition model is utilized to calculate a probability matrix and a recognition result is output according to the probability matrix, so that the real-time performance of reasoning is improved, and the character recognition efficiency is improved.
On the basis of the above embodiment, each of the depth separable convolution blocks includes a short circuit connection therein;
if the depth separable convolution block is located in the downsampling layer, the short circuit connection is a convolution of 1×1;
otherwise, the short circuit connection is a characteristic diagram input by the upper layer.
On the basis of the above embodiment, the word recognition model is configured to take, for each slice, a character corresponding to a maximum probability according to a probability matrix corresponding to the slice as a recognition result corresponding to the slice based on a greedy search algorithm.
On the basis of the above embodiment, the identification module includes:
and the input unit is used for detecting images containing characters to be recognized segment by segment, and inputting the currently detected images with the set length into the character recognition model every time the images with the set length are detected.
On the basis of the above embodiment, the text recognition model is further configured to: and if the text recognition model outputs images with different set lengths and comprises a superposition area, weighting and averaging the corresponding positions in the corresponding probability matrix of the superposition area to obtain the probability matrix of the superposition area.
Based on the above embodiment, the acquisition module 310 includes:
the construction unit is used for constructing a character recognition model and training the character recognition model;
the simplifying unit is used for converting the character recognition model into an open standard file format and merging partial layers through a simplifying tool;
and the format conversion unit is used for converting the identification model into an inference framework format by using a conversion tool.
On the basis of the above embodiment, the device further includes:
and the scaling module is used for scaling the image containing the characters to be identified to a preset size according to the principle of equal proportion.
The character recognition device provided by the third embodiment of the invention can be used for executing the character recognition method provided by any embodiment, and has corresponding functions and beneficial effects.
Example IV
Fig. 7 shows a schematic diagram of an electronic device that may be used to implement an embodiment of the invention.
As shown in fig. 7, an electronic device provided in this embodiment includes: a processor 410 and a storage 420. The processor in the electronic device may be one or more, for example, a processor 410 in fig. 7, and the processor 410 and the storage 420 in the electronic device may be connected by a bus or other means, for example, by a bus connection in fig. 7.
The one or more programs are executed by the one or more processors 410 to cause the one or more processors to implement the text recognition method as described in any of the above embodiments.
The storage 420 in the electronic device is used as a computer readable storage medium, and may be used to store one or more programs, which may be software programs, computer executable programs, and modules, such as program instructions/modules corresponding to the method of recognizing characters in the embodiment of the present invention (for example, the modules in the character recognition device shown in fig. 6, including the acquisition module 310 and the recognition module 320). The processor 410 executes various functional applications of the electronic device and data processing by running software programs, instructions and modules stored in the storage 420, i.e., implements the word recognition method in the above-described method embodiments.
The storage device 420 mainly includes a storage program area and a storage data area, wherein the storage program area can store an operating system and at least one application program required by functions; the storage data area may store data created according to the use of the electronic device, etc. (e.g., feature sequences, probability matrices, word recognition results, etc. in the above-described embodiments). In addition, the storage 420 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some examples, the storage 420 may further include memory remotely located with respect to the processor 410, which may be connected to a server via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
And, when one or more programs included in the above-described electronic device are executed by the one or more processors 410, the programs perform the following operations: the method comprises the steps of obtaining a character recognition model, wherein the character recognition model comprises a convolution layer, a circulation layer and a transcription layer, the convolution layer is formed by stacking depth-separable convolution blocks, and the convolution layer is used for extracting a characteristic sequence of an input image; the circulating layer is used for determining a predicted result of the input image according to the characteristic sequence and calculating the deviation between the predicted result and the true value to obtain a probability matrix; the transcription layer is used for outputting the recognition result of the input image according to the probability matrix; and inputting the image containing the characters to be identified into the character identification model to obtain a character identification result.
The electronic device further includes: a communication device 430, an input device 440, and an output device 450.
The processor 410, memory 420, communication means 430, input means 440 and output means 450 in the electronic device may be connected by a bus or other means, for example by a bus connection in fig. 4.
The input device 440 may be used to receive input numeric or character information and to generate key signal inputs related to user settings and function control of the electronic device. The output 450 may include a display device such as a display screen.
The communication device 430 may include a receiver and a transmitter. The communication means 430 is arranged to perform information transceiving communication according to control of the processor 410.
The electronic device according to the present embodiment and the text recognition method according to the foregoing embodiments belong to the same inventive concept, and technical details not described in detail in the present embodiment can be found in any of the foregoing embodiments, and the present embodiment has the same advantages as those of executing the text recognition method.
On the basis of the above embodiment, this embodiment further provides a computer-readable storage medium having stored thereon a computer program which, when executed by a data storage device, implements the robot scheduling method in any of the above embodiments of the present invention, the method comprising: the method comprises the steps of obtaining a character recognition model, wherein the character recognition model comprises a convolution layer, a circulation layer and a transcription layer, the convolution layer is formed by stacking depth-separable convolution blocks, and the convolution layer is used for extracting a characteristic sequence of an input image; the circulating layer is used for determining a predicted result of the input image according to the characteristic sequence and calculating the deviation between the predicted result and the true value to obtain a probability matrix; the transcription layer is used for outputting the recognition result of the input image according to the probability matrix; and inputting the image containing the characters to be identified into the character identification model to obtain a character identification result.
Of course, the storage medium containing the computer executable instructions provided by the embodiments of the present invention is not limited to the operations of the data storage method described above, but may also perform the related operations in the data storage method provided by any embodiment of the present invention, and has corresponding functions and beneficial effects.
From the above description of embodiments, it will be clear to a person skilled in the art that the present invention may be implemented by means of software and necessary general purpose hardware, but of course also by means of hardware, although in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, etc., and include several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the data storage method according to the embodiments of the present invention.
Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims (10)

1. A method of text recognition, comprising:
the method comprises the steps of obtaining a character recognition model, wherein the character recognition model comprises a convolution layer, a circulation layer and a transcription layer, the convolution layer is formed by stacking depth-separable convolution blocks, and the convolution layer is used for extracting a characteristic sequence of an input image; the circulating layer is used for determining a predicted result of the input image according to the characteristic sequence and calculating the deviation between the predicted result and the true value to obtain a probability matrix; the transcription layer is used for outputting the recognition result of the input image according to the probability matrix;
and inputting the image containing the characters to be identified into the character identification model to obtain a character identification result.
2. The method of claim 1, wherein each of the depth separable convolution blocks includes a shorting connection therein;
if the depth separable convolution block is located in the downsampling layer, the short circuit connection is a convolution of 1×1;
otherwise, the short circuit connection is a characteristic diagram input by the upper layer.
3. The method of claim 1, wherein the word recognition model is configured to take, for each slice, a character corresponding to a maximum probability according to a probability matrix corresponding to the slice as a recognition result corresponding to the slice based on a greedy search algorithm.
4. The method of claim 1, wherein inputting an image containing text to be recognized into the text recognition model comprises:
detecting images containing characters to be recognized segment by segment, and inputting the currently detected images with the set length into the character recognition model every time the images with the set length are detected.
5. The method of claim 4, wherein the word recognition model is further configured to:
and if the text recognition model outputs images with different set lengths and comprises a superposition area, weighting and averaging the corresponding positions in the corresponding probability matrix of the superposition area to obtain the probability matrix of the superposition area.
6. The method of claim 1, wherein obtaining a word recognition model comprises:
constructing a character recognition model and training the character recognition model;
converting the character recognition model into an open standard file format and merging partial layers through a simplifying tool;
the recognition model is converted to an inference framework format using a conversion tool.
7. The method as recited in claim 1, further comprising:
and scaling the image containing the text to be recognized to a preset size according to the principle of equal proportion.
8. A character recognition device, comprising:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a character recognition model, the character recognition model comprises a convolution layer, a circulation layer and a transcription layer, the convolution layer is formed by stacking depth separable convolution blocks, and the convolution layer is used for extracting a characteristic sequence of an input image; the circulating layer is used for determining a predicted result of the input image according to the characteristic sequence and calculating the deviation between the predicted result and the true value to obtain a probability matrix; the transcription layer is used for outputting the recognition result of the input image according to the probability matrix;
and the recognition module is used for inputting the image containing the characters to be recognized into the character recognition model to obtain a character recognition result.
9. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the word recognition method of any one of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements a text recognition method as claimed in any one of claims 1-7.
CN202310673162.0A 2023-06-07 2023-06-07 Character recognition method, character recognition device, electronic equipment and storage medium Pending CN116704519A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310673162.0A CN116704519A (en) 2023-06-07 2023-06-07 Character recognition method, character recognition device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310673162.0A CN116704519A (en) 2023-06-07 2023-06-07 Character recognition method, character recognition device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116704519A true CN116704519A (en) 2023-09-05

Family

ID=87830723

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310673162.0A Pending CN116704519A (en) 2023-06-07 2023-06-07 Character recognition method, character recognition device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116704519A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116912856A (en) * 2023-09-14 2023-10-20 深圳市贝铂智能科技有限公司 Image identification method and device of intelligent scanning pen and intelligent scanning pen
CN117058689A (en) * 2023-10-09 2023-11-14 巴斯夫一体化基地(广东)有限公司 Offline detection data processing method for chemical production

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116912856A (en) * 2023-09-14 2023-10-20 深圳市贝铂智能科技有限公司 Image identification method and device of intelligent scanning pen and intelligent scanning pen
CN117058689A (en) * 2023-10-09 2023-11-14 巴斯夫一体化基地(广东)有限公司 Offline detection data processing method for chemical production
CN117058689B (en) * 2023-10-09 2024-02-20 巴斯夫一体化基地(广东)有限公司 Offline detection data processing method for chemical production

Similar Documents

Publication Publication Date Title
CN116704519A (en) Character recognition method, character recognition device, electronic equipment and storage medium
CN109993102B (en) Similar face retrieval method, device and storage medium
CN109740668B (en) Deep model training method and device, electronic equipment and storage medium
RU2693916C1 (en) Character recognition using a hierarchical classification
CN110689012A (en) End-to-end natural scene text recognition method and system
CN114187311A (en) Image semantic segmentation method, device, equipment and storage medium
CN111915555B (en) 3D network model pre-training method, system, terminal and storage medium
JP6107531B2 (en) Feature extraction program and information processing apparatus
CN114596566A (en) Text recognition method and related device
CN112990331A (en) Image processing method, electronic device, and storage medium
CN114419570A (en) Point cloud data identification method and device, electronic equipment and storage medium
CN110852295A (en) Video behavior identification method based on multitask supervised learning
CN111582459A (en) Method, electronic device, apparatus and storage medium for executing operation
CN112801103A (en) Text direction recognition and text direction recognition model training method and device
CN114492601A (en) Resource classification model training method and device, electronic equipment and storage medium
CN117851826A (en) Model construction method, model construction device, apparatus, and storage medium
CN115294397A (en) Classification task post-processing method, device, equipment and storage medium
CN113361567B (en) Image processing method, device, electronic equipment and storage medium
CN114495101A (en) Text detection method, and training method and device of text detection network
CN112308149B (en) Optimization method and device for image information identification based on machine learning
CN113569581B (en) Intention recognition method, device, equipment and storage medium
CN115080745A (en) Multi-scene text classification method, device, equipment and medium based on artificial intelligence
CN117612231B (en) Face detection method, device, electronic equipment and storage medium
CN117274777A (en) Difficult sample mining method, apparatus, device, and computer-readable storage medium
CN114943958A (en) Character recognition method, character recognition device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20240518

Address after: 518000 303, block B, Fu'an science and technology building, No. 013, Gaoxin South 1st Road, high tech Zone community, Yuehai street, Nanshan District, Shenzhen, Guangdong

Applicant after: Shenzhen haoxueduo Intelligent Technology Co.,Ltd.

Country or region after: China

Address before: 518000 Guangdong 4 Baoan District City, Shenzhen Province, the third floor of the community of Taihang Wutong Industrial Park, 9A

Applicant before: Shenzhen Rubu Technology Co.,Ltd.

Country or region before: China