CN111291629A - Method and device for recognizing text in image, computer equipment and computer storage medium - Google Patents

Method and device for recognizing text in image, computer equipment and computer storage medium Download PDF

Info

Publication number
CN111291629A
CN111291629A CN202010051888.7A CN202010051888A CN111291629A CN 111291629 A CN111291629 A CN 111291629A CN 202010051888 A CN202010051888 A CN 202010051888A CN 111291629 A CN111291629 A CN 111291629A
Authority
CN
China
Prior art keywords
text
image
sample image
printing
character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010051888.7A
Other languages
Chinese (zh)
Inventor
杨紫崴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Ping An Medical Health Technology Service Co Ltd
Original Assignee
Ping An Medical and Healthcare Management Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Medical and Healthcare Management Co Ltd filed Critical Ping An Medical and Healthcare Management Co Ltd
Priority to CN202010051888.7A priority Critical patent/CN111291629A/en
Publication of CN111291629A publication Critical patent/CN111291629A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)
  • Character Discrimination (AREA)

Abstract

The application discloses a method and a device for recognizing a text in an image and a computer storage medium, which relate to the technical field of text recognition. The method comprises the following steps: acquiring a text sample image of a needle-like printing font after scene processing; respectively inputting the character sample images of the needle-like printing fonts serving as training data into network models of different architectures for training to obtain a text region detection model and a text recognition model; when an image text detection request is received, inputting the image requested to be detected to the text region identification model, and determining the position information of the text region corresponding to the image; and inputting the position information of the text region corresponding to the image and the image requested to be detected into the text recognition model together to obtain the text information in the image.

Description

Method and device for recognizing text in image, computer equipment and computer storage medium
Technical Field
The present invention relates to the field of text recognition technologies, and in particular, to a method and an apparatus for recognizing text in an image, a computer device, and a computer storage medium.
Background
At present, the OCR recognition technology can well recognize characters in pictures, is applied to various fields such as certificate recognition and bill recognition, and replaces manual input to a great extent. The trouble of manual entry is greatly saved. And a large amount of marked data is an important part of a model training process in an OCR (optical character recognition) technology, and higher manpower, material resources and time cost are required.
The method utilizes an algorithm to generate character sample data simulating a real scene to augment the labeled data, and can achieve the scale of the labeled data required by model training to a certain extent. However, the character sample data generated by the algorithm to simulate the scene is usually continuous strokes, and it is difficult to cover the character sample data in some specific scenes, such as the character sample printed by a pin printer, which is a stroke composed of individual dot matrixes, so that the character sample data in the model training process lacks diversity, the trained model cannot be well fitted to the actual scene, and the accuracy of text recognition is affected.
Disclosure of Invention
In view of the above, the present invention provides a method, an apparatus, a computer device and a computer storage medium for recognizing a text in an image, and mainly aims to solve the problem that the accuracy of text recognition is low because a trained model cannot well fit an actual scene due to the lack of diversity of text samples in the training process of the existing text recognition model.
According to an aspect of the present invention, there is provided a method for recognizing text in an image, the method comprising:
acquiring a text sample image of a needle-like printing font after scene processing;
respectively inputting the character sample images of the needle-like printing fonts serving as training data into network models of different architectures for training to obtain a text region detection model and a text recognition model;
when an image text detection request is received, inputting the image requested to be detected to the text region identification model, and determining the position information of the text region corresponding to the image;
and inputting the position information of the text region corresponding to the image and the image requested to be detected into the text recognition model together to obtain the text information in the image.
Further, the acquiring of the text sample image of the needle-like printing font after the scenarization processing specifically includes:
acquiring a printing sample image generated by using a printing mode, and setting an attribute value corresponding to the printing sample image;
and performing scene processing on the printing sample image by changing the attribute value corresponding to the pixel in the printing sample image to obtain a character sample image of the needle-like printing character body.
Further, the obtaining a text sample image of a needle-like printed character by changing the attribute value corresponding to the pixel in the print sample image and performing a scenization process on the print sample image specifically includes:
determining an optimal threshold value for dividing corresponding color attribute values of pixels in the printing sample image by using a maximum inter-class variance method;
performing binarization processing on the printing sample image by taking the optimal threshold value as a dividing basis to obtain background pixels and foreground pixels of the printing sample image after binarization processing;
dividing background pixels of the print sample image after the binarization processing into a plurality of background parts according to a preset proportion;
and performing scene processing on the printing sample image according to the pixel value of the corresponding parameter of each background part to obtain a character sample image of the needle-like printing character body.
Further, the determining an optimal threshold value for dividing the color attribute values corresponding to the pixels in the print sample image by using the maximum inter-class variance method specifically includes:
dividing color attribute values corresponding to pixels in the printing sample image into two groups by using an assumed gray value, and calculating inter-class variance, wherein one group of color attribute values is taken as the assumed gray value, and the other group of color attribute values is not greater than the assumed gray value;
and determining the assumed gray value at the maximum value of the inter-class variance as the optimal threshold value of the color attribute value by changing the assumed gray value.
Further, the performing a scenization process on the print sample image according to the pixel value of the parameter corresponding to each background portion to obtain a text sample image of the needle-like print body specifically includes:
obtaining a character sample image of a pin-like printing font after a scene with increased contrast by adjusting the pixel value of the corresponding contrast of each background part, so that the character sample image covers scenes with different contrasts;
the fuzzy processing is carried out on the pixel values of the corresponding parameters of each background part, so that the character sample image of the pin-like printing font with the increased fuzzy effect is obtained, and the scene with the fuzzy effect is covered by the character sample image.
Further, the step of inputting the character sample image with the needle-like printing font as training data into network models of different architectures for training respectively to obtain a text region detection model and a text recognition model specifically includes:
marking the position information of the text area in the character sample image of the needle-like printing font, and inputting the marked position information into a first network model for training to obtain a text area detection model;
and labeling the text information in the text area in the character sample image of the needle-like printing font, and inputting the labeled text information into a second network model for training to obtain a text recognition model.
Further, the first network model includes a multilayer structure, and the method includes the steps of labeling the position information of the text region in the text sample image of the needle-like printing font, inputting the labeled position information into the first network model for training to obtain a text region detection model, and specifically includes:
extracting image area characteristics corresponding to the character sample image of the needle-like printing font through the convolution layer of the first network model;
generating horizontal text sequence characteristics according to image region characteristics corresponding to the character sample images through a decoding layer of the first network model;
and determining a text region in the text sample image according to the horizontal text sequence characteristics through a prediction layer of the first network model, and processing the text region to obtain a candidate text line.
According to another aspect of the present invention, there is provided an apparatus for recognizing text in an image, the apparatus comprising:
the acquiring unit is used for acquiring a character sample image of the pin-like printing font after the scene processing;
the training unit is used for inputting the character sample images of the needle-like printing fonts serving as training data into network models of different architectures respectively for training to obtain a text region detection model and a text recognition model;
the determining unit is used for inputting the image requested to be detected to the text region identification model when the image text detection request is received, and determining the position information of the text region corresponding to the image;
and the identification unit is used for inputting the position information of the text region corresponding to the image and the image requested to be detected into the text identification model together to obtain the text information in the image.
Further, the acquisition unit includes:
the device comprises a setting module, a processing module and a display module, wherein the setting module is used for acquiring a printing sample image generated by a printing mode and setting an attribute value corresponding to the printing sample image;
and the processing module is used for performing scene processing on the printing sample image by changing the attribute value corresponding to the pixel in the printing sample image to obtain the character sample image of the needle-like printing character body.
Further, the processing module comprises:
the determining submodule is used for determining an optimal threshold value for dividing the corresponding color attribute values of the pixels in the printing sample image by using a maximum inter-class variance method;
the first processing submodule is used for carrying out binarization processing on the printing sample image by taking the optimal threshold value as a dividing basis to obtain background pixels and foreground pixels of the printing sample image after the binarization processing;
the dividing submodule is used for dividing the background pixels of the print sample image after the binarization processing into a plurality of background parts according to a preset proportion;
and the second processing submodule is used for performing scene processing on the printing sample image according to the pixel value of the corresponding parameter of each background part to obtain a character sample image of the needle-like printing character body.
Further, the determining sub-module is specifically configured to divide the color attribute values corresponding to the pixels in the print sample image into two groups by using an assumed gray scale value, and calculate an inter-class variance, where one group of color attribute values is taken as the assumed gray scale value, and the other group of color attribute values is not greater than the assumed gray scale value;
the determining sub-module is specifically further configured to determine, as the optimal threshold of the color attribute value, the assumed gray value at the time of the maximum value of the inter-class variance by changing the assumed gray value.
Further, the second processing sub-module is specifically configured to obtain a text sample image of a pin-like printing font after a scene with increased contrast by adjusting pixel values of the corresponding contrasts of the background portions, so that the text sample image covers scenes with different contrasts;
the second processing sub-module is specifically configured to perform blurring processing on the pixel values of the parameters corresponding to the background portions to obtain a text sample image with increased blurring effect and similar to a pin print font, so that the text sample image covers a scene with blurring effect.
Further, the training unit comprises:
the first training module is used for marking the position information of the text area in the character sample image of the needle-like printing font and inputting the marked position information into a first network model for training to obtain a text area detection model;
and the second training module is used for labeling the text information in the text area in the character sample image of the needle-like printing font and inputting the labeled text information into a second network model for training to obtain a text recognition model.
Further, the first network model includes a multi-layer structure,
the first training module is specifically configured to extract image area features corresponding to the text sample image of the pin-like printing font through the convolution layer of the first network model;
the first training module is specifically further configured to generate a horizontal text sequence feature according to an image region feature corresponding to a text sample image through a decoding layer of the first network model;
the first training module is specifically further configured to determine a text region in the text sample image according to the horizontal text sequence feature through a prediction layer of the first network model, and process the text region to obtain a candidate text line.
According to yet another aspect of the invention, a computer device is provided, comprising a memory storing a computer program and a processor implementing the steps of the method for recognition of text in an image when executing the computer program.
According to a further aspect of the invention, a computer storage medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method for recognition of text in an image.
By means of the technical scheme, the text in the image recognition method and the text in the image recognition device are provided, the text sample image of the needle-like printing font after the scene processing is obtained, and the text sample image after the scene processing is covered with richer picture features, so that the trained text region detection model and the trained text recognition model have higher scene recognition capability, and text information in different scene images can be recognized in the process of recognizing the text in the image. Compared with the method for recognizing the text in the image in the prior art, the method has the advantages that the sample data collected in the actual scene is expanded, a large amount of labor cost is not needed to be consumed to collect the sample, the sample collection process is simplified, the labeling time of the sample data is saved, the model trained by the expanded sample data can be used for well fitting the actual scene, and the accuracy of recognizing the text in the image is improved.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 is a schematic flow chart illustrating a method for recognizing text in an image according to an embodiment of the present invention;
FIG. 2 is a flow chart illustrating another method for recognizing text in an image according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram illustrating an apparatus for recognizing text in an image according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram illustrating another apparatus for recognizing text in an image according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The embodiment of the invention provides a method for recognizing a text in an image, which can enable a trained model to well fit an actual scene and improve the accuracy of text recognition in the image, and as shown in fig. 1, the method comprises the following steps:
101. and acquiring a text sample image of the needle-like printing font after the scene processing.
The text sample image of the print-like font may be a text sample image generated by a printing manner, such as an invoice image, a document image, and the like.
In the process of acquiring the character sample images for class printing, a conventional bill image is usually selected as the character sample image for class printing, and in order to enrich the diversity of the character sample images, different shooting devices can be used for shooting the bill images at different shooting backgrounds, light rays, brightness, shooting angles and the like aiming at the conventional bill images, so that the character sample images with different changes of the backgrounds, the light rays, the brightness and the like combined are generated, and the character sample images can be combined with an actual application scene in the subsequent training process.
It can be understood that certain scene processing can be added to the text sample image by adjusting the shooting scene in the shooting bill image, and the text sample image with different background colors can be selected to perform gray scale processing on the background color of the text sample image, so as to adjust the contrast between the foreground and the background of the text sample image, so that the text sample image covers the scenes with different contrasts, and the image background part after the gray scale processing can be subjected to fuzzy processing, so that the text sample image covers the scenes with the fuzzy effect, and certainly, the processing such as noise increase or scale reduction is performed, and the scene processing mode is not limited here.
102. And respectively inputting the character sample images of the needle-like printing fonts serving as training data into network models of different architectures for training to obtain a text region detection model and a text recognition model.
The Network model for training the Text region detection model may use an open source DetectingText-in-natural Image with connectivity Text forward Network (CTPN) framework. The specific process of training the textbox detection model may be as follows: the method comprises the steps of firstly preparing training data, namely a character sample image of a similar printing font and annotation data corresponding to the character sample image, wherein coordinate information corresponding to a text region in the image is recorded in the annotation data, converting the coordinate information corresponding to the text region in the annotation data into a small anchor with the width of 8 before the training data is input into a CTPN network, and predicting and identifying information in each small text region by splitting the text region into small text region sets, so that the accuracy of text region detection can be greatly improved. The CTPN network structure adopts a form of CNN + BLSTM + RPN, the CNN is used for extracting spatial features of receptive fields, the receptive fields are regions of input images corresponding to responses of certain nodes (convolved by convolution kernels), the BLSTM can generate horizontal text sequence features based on the spatial features of the receptive fields, the RPN comprises two parts, an anchor classification and a bounding box regression, whether each region is a text region can be determined through the anchor classification, and a group of vertical strip-shaped candidate text lines can be obtained after the bounding box regression processing.
It should be noted that, the text region output by the pre-trained text region detection model is not directly the text region in the target recognition image, but is a set of candidate text lines in vertical stripes forming the text region in the target recognition image, and the text region in the target recognition image and the position information of the text region may be determined by connecting the set of candidate text lines in vertical stripes to form the text region by using a text line construction algorithm.
The Network model for training the Text Recognition model can adopt An End-to-End Train available neural Network for Imaged-based Sequence Recognition and Its application scene Text Recognition (CRNN) algorithm to Train the Recognition model, and after the character sample image of the pin-like printing font and the position information of the Text region marked in the character sample image pass through the Text Recognition model, a Text Recognition result corresponding to each Text region in the character sample image of the pin-like printing font is output. The process of specifically training the CRNN model may be as follows: firstly, training data are stored in a label mode by adopting a character sample image of a pin-like printing font and text information of a text area in the character sample image. The CRNN network structure here adopts a form of CNN + RNN + CTC, where CNN is used to extract spatial features of receptive fields in an image, RNN can predict label distribution of each frame in the image based on the spatial features of the receptive fields, and CTC can integrate the label distribution of each frame into a final label sequence. For example, the size of the input picture resize to W × 32, and the predicted value output by the text recognition model represents text information corresponding to a text region in the target recognition image.
It should be noted that the training data used for training the text region detection model and the text recognition model has rich features of the printed image, so that the trained text region detection model and the trained text recognition model can more fully cover the application scene of the image with printed fonts, and the detection effect of the text region in the image and the recognition effect of the text information in the text region are improved.
103. When an image text detection request is received, inputting the image requested to be detected into the text region identification model, and determining the position information of the text region corresponding to the image.
It can be understood that each image has a corresponding output file through the text region detection model, and the output file stores the position information of all candidate text boxes in the image and whether the candidate text line is a label of the text region, where the candidate text box is equivalent to a vertical strip-shaped box split from the text region.
Specifically, in the process of determining the position information of the text region corresponding to the image, a series of candidate text boxes output by the text region detection model may be marked as text documents, and in the process of generating the position information of the text region corresponding to the image from the candidate text lines corresponding to the image based on the text line construction algorithm, whether the candidate text boxes are labels of the text region or not is considered, so that the series of text documents are connected into a large text region according to whether the candidate text boxes are labels of the text region or not, the text region corresponding to the image is formed, and the position information of the text region corresponding to the image is determined.
104. And inputting the position information of the text region corresponding to the image and the image requested to be detected into the text recognition model together to obtain the text information in the image.
It can be understood that the trained text recognition model has the capability of recognizing text information in a text region, and in the process of training the text recognition model, the text sample image of the needle-like printing font and the label of the text information in the text region in the text sample image are used, and parameters of the text recognition model are continuously adjusted through forward propagation and reverse deviation correction, so that the text information in the text region in the printing font image can be accurately recognized through the image of the text recognition model.
According to the method for recognizing the text in the image, the text sample image of the needle-like printing font after the scene processing is obtained, and the text sample image after the scene processing is covered with richer picture characteristics, so that the text region detection model and the text recognition model obtained through training have higher scene recognition capability, and therefore text information in different scene images can be recognized in the process of recognizing the text in the image. Compared with the method for recognizing the text in the image in the prior art, the method has the advantages that the sample data collected in the actual scene is expanded, a large amount of labor cost is not needed to be consumed to collect the sample, the sample collection process is simplified, the labeling time of the sample data is saved, the model trained by the expanded sample data can be used for well fitting the actual scene, and the accuracy of recognizing the text in the image is improved.
The embodiment of the invention provides another method for recognizing texts in images, so that a trained model can well fit an actual scene, and the accuracy of text recognition in the images is improved, as shown in fig. 2, the method comprises the following steps:
201. acquiring a printing sample image generated by using a printing mode, and setting an attribute value corresponding to the printing sample image.
Generally, when a printed text image generated by a printing mode identifies text information through a text identification model, because the text information in the printed text image is strokes composed of individual dot matrixes and the text image generated by a later text generation algorithm is continuous strokes, the printed text cannot be identified well in the text identification model.
In order to better identify the text information in the printed text image, the printed sample image generated by using the printing mode can be acquired as the training data of the text identification model, and the diversity of the training data can be enriched by setting the attribute value corresponding to the printed sample image.
Specifically, the attribute values corresponding to the printing sample images are set, the background images of the printing sample images can be randomly selected, the color mean value Bcolor of the background images is calculated and recorded, the color of text information in the printing sample images can be randomly selected, the color of the text information is recorded as Tcolor, the size and the interval of text fonts in the printing sample images can be randomly selected, the printing sample images can be made to be more fit with the printing text images under the actual application scene through setting the attribute values corresponding to the printing sample images, and therefore the generated printing sample images have a better training effect in the subsequent model training process.
For example, for font selection in the print sample image, a song body, an imitation song, a gothic font, etc. may be set in the print image, and a print sample image of one font may be randomly selected from these several fonts when each print sample image is generated; the same applies to the font color and the rotation angle of the print sample image, and the random value of a certain interval is randomly adjusted to set the attribute value corresponding to the print image sample.
202. And performing scene processing on the printing sample image by changing the attribute value corresponding to the pixel in the printing sample image to obtain a character sample image of the needle-like printing character body.
For the embodiment of the present invention, specifically, attribute values corresponding to pixels in a print sample image are changed, an optimal threshold value for dividing color attribute values corresponding to pixels in the print sample image is determined by using a maximum inter-class variance method, and binarization processing is performed on the print sample image by using the optimal threshold value as a dividing basis to obtain background pixels and foreground pixels of the print sample image after binarization processing.
Specifically, the assumed gray value can be used to divide the color attribute values corresponding to the pixels in the print sample image into two groups, calculate the inter-class variance, where one group of color attribute values is assumed gray value, and the other group of color attribute values is not greater than assumed gray value, and determine the maximum value of the inter-class variance as the optimal threshold value of the color attribute value by changing the assumed gray value.
It should be noted that, in the above, the optimal threshold value of the color attribute value corresponding to the pixel in the divided print sample image is determined by using the maximum inter-class variance method, and black and white may also be directly selected as the color attribute value corresponding to the pixel in the divided print sample image, for example, the print sample image is subjected to binarization processing to obtain the binarization mask images of the image foreground (white) and the image background (black), respectively.
In order to change the background color of the print image, after the print sample image is subjected to binarization processing, Tcolor is an optimal threshold value of a color attribute, pixels on a mask are traversed line by line from the upper left corner, the total number N of each continuous white pixel is recorded, the length of a pixel needing a breakpoint is assumed to be M, the current traversal pixel value is set to be P, and a corresponding region on the print sample image with P/(2M) > M is set to be a background Bcolor.
For the embodiment of the present invention, the print sample image is specifically subjected to a scenarization process to obtain a text sample image of the needle-like printed character, the background pixels of the print sample image after the binarization process are divided into a plurality of background portions according to a preset proportion, and the print sample image is subjected to a scenarization process for the pixel values of the parameters corresponding to the background portions to obtain a text sample image of the needle-like printed character.
For example, M pixels that become the background color on the original image may be divided into three parts according to a ratio of 1:4:1, and the set colors of the three parts are (Bcolor + Tcolor)/2, Bcolor, (Bcolor + Tcolor)/2, respectively.
In order to change the contrast of the printing sample image, the character sample image of the similar needle printing font after the scene with increased contrast can be obtained by adjusting the pixel value of the corresponding contrast of each background part, so that the character sample image covers the scenes with different contrasts;
the formula for specifically adjusting the contrast of the image of the print sample is as follows:
g(x)=alpha*f(x)+beta
wherein, alpha: random value of 0.2-0.8, beta: random values of 0.2-0.8;
by multiplying the print sample image by alpha and adding beta, a print sample image with different contrast can be obtained, so that the generated print sample image can cover more contrast scenes.
In order to change the blurring effect of the printing sample image, the blurring processing is carried out on the pixel values of the corresponding parameters of each background part to obtain the character sample image of the pin-like printing font after the blurring effect is increased, so that the character sample image covers the scene of the blurring effect.
Specifically, the method for adjusting the blurring degree of the print sample image may include motion blurring and gaussian blurring, and for increasing the motion blurring effect of the print sample image, the print sample image may be subjected to the following operations:
firstly, determining a transformation matrix:
M=cv2.getRotationMatrix2D(center,angle,scale)
Figure BDA0002371461890000111
wherein the content of the first and second substances,
α=scale·cosangle
β=scale·sinangle
then, obtaining a convolution kernel of the motion blur through affine transformation:
cv2.warpAffine(src,dst,M,dsize)
kernel(x,y)=src(M11*x+M12*y+M13,M21*x+M22*y+M23)
then performing convolution operation;
cv2.filter2D(src,dst,ddepth,kernel,anchor=(-1,-1))
dst(x,y)=Σkernel(x′,y′)*src(x+x′-anchor.x,y+y′-anchor.y)
finally, after the printing sample image and the Gaussian kernel are subjected to convolution operation, the motion blur effect can be increased, wherein the blur angle and the blur degree of the motion blur can be randomly generated through random so as to ensure the diversity of the printing sample image.
For increasing the gaussian blur effect of a print sample image, the print image sample may be subjected to the following operations:
the gaussian kernel is first determined by:
Cv2.getGaussianKernel(ksize,sigma)
Figure BDA0002371461890000121
where i 0. ksize-1, α is a scaling factor such that Σ Gi 1
Finally, after the convolution operation is carried out on the printing sample image and the Gaussian kernel, the effect of Gaussian blur can be increased.
203. And marking the position information of the text area in the character sample image of the needle-like printing font, and inputting the marked position information into a first network model for training to obtain a text area detection model.
In order to facilitate the definition of the boundary of the text region, different regions may exist in the text sample image of the pin-like printing font, for example, the text region, the picture region, the blank region, and the like, and the non-text region is not a target region for text region detection, so that the text region needs to be labeled.
The first network model can adopt a CTPN network frame and comprises a 3-layer structure, the first layer is a convolution structure, namely a CNN structure, and spatial information of a receptive field can be learned by extracting image region characteristics corresponding to a text sample image through a convolution layer; the second layer is a decoding layer, namely a BLSTM structure, and generates horizontal text sequence characteristics according to image area characteristics corresponding to character sample images through the decoding layer, so that the sequence characteristics of horizontal texts can be well dealt with; and the third layer is a prediction layer, namely an RPN structure, determines a text region in the text sample image according to the horizontal text sequence characteristics through the prediction layer, and processes the text region to obtain candidate text lines.
Specifically, the prediction layer of the first network model comprises a classification part and a regression part, and in the process of determining the text region in the text sample image according to the horizontal text sequence characteristics through the prediction layer of the network model and processing the text region to obtain candidate text lines, the classification part of the prediction layer of the network model can classify each region in the text sample image according to the horizontal text sequence characteristics to determine the text region in the text sample image; and performing frame regression processing on the text region in the character sample image through a regression part of a prediction layer of the network model to obtain candidate text lines.
In the specific implementation process, in the convolutional layer part, the CTPN may select feature maps of conv5 in the VGG model as the final feature of the image, where the size of the feature maps is H × W × C; then, due to the sequence relation among texts, a 3 × 3 area around each point on the feature maps can be extracted by adopting a 3 × 3 sliding window at the decoding layer to be used as the feature vector representation of the point, at the moment, the size of the image is changed into H × W × 9C, then each line is used as the length of the sequence, the height is used as the batch _ size, a 128-dimensional Bi-LSTM is transmitted in, and the output of the decoding layer is W × H × 256; and finally, outputting a decoding layer and accessing the decoding layer into a prediction layer, wherein the prediction layer comprises two parts, namely anchor classification and bounding box regression, whether each region in the image is a text region can be determined through the anchor classification, and a group of vertical strip-shaped candidate text lines can be obtained after the bounding box regression processing and carry a label of whether the candidate text lines are the text regions.
Further, in order to ensure the accuracy of the prediction of the trained text region detection model, the preset loss function can perform parameter adjustment on the multilayer structure in the text region detection model based on the deviation between the result output by the text region detection model and the data labeled by the real text region. For the embodiment of the invention, the pre-trained loss function mainly comprises 3 parts, wherein the first part is a loss function for detecting whether the Anchor is a text region; the second part is a loss function for detecting regression of the anchor's y-coordinate offset; the third part is the loss function of the x-coordinate offset regression used to detect Anchor.
204. And labeling the text information in the text area in the character sample image of the needle-like printing font, and inputting the labeled text information into a second network model for training to obtain a text recognition model.
The second network model can adopt a CRNN network architecture and comprises 3 layers of structures, the first layer is a convolution structure, namely a CNN structure, and the spatial information of the receptive field can be learned by extracting image region characteristics corresponding to the text sample image through the convolution layer; the second layer structure is a circulation layer, namely an RNN structure, and the label distribution of each frame in the image is predicted through the circulation layer according to the image area characteristics corresponding to the character sample image; the third layer structure is a transcription layer, namely a CTC structure, the label distribution of each frame in the image is integrated and the like through the transcription layer to form a final label sequence, and a text recognition result corresponding to each text region in the character sample image is output.
In the specific implementation process, in the convolutional layer part, a feature sequence of an input text sample image can be automatically extracted, vectors in the extracted feature sequence are generated sequentially from left to right on a feature map, and each feature vector represents a feature on a certain width on the image. In the cyclic layer part, the RNN cyclic neural network can be used for forming, the label distribution (probability list of real results) of each feature vector in the feature sequence is predicted, the error of the cyclic layer is propagated reversely, finally regret is converted into the feature sequence, the feature sequence is fed back to the convolutional layer, and the cyclic layer part can be used as a bridge connected between the convolutional layer and the cyclic layer by defining a self-defined inner layer. In the transcription layer part, a CTC model can be used to transform all possible outcomes of a predicted signature sequence into a final outcome by integrating them, typically connected at the last layer of the RNN network for sequence learning and training. For a sequence with the length of T, each sample point T (T is far larger than T) outputs a softmax vector at the last layer of the RNN, which represents the prediction probability of the sample point, and after the probabilities of all the sample points are transmitted to the CTC model, the most probable label is output, and the final sequence label can be obtained through space (blank) removal and deduplication operations.
205. When an image text detection request is received, inputting the image requested to be detected into the text region identification model, and determining the position information of the text region corresponding to the image.
It can be understood that each printed sample image has a corresponding output file through the text region detection model, the output file stores the position information of all candidate text lines in the image and whether the candidate text lines are labels of the text regions, the candidate text lines are equivalent to vertical strip lines split from the text regions, the candidate text lines are connected to form the text regions in the image based on a text line construction algorithm, and the position information of the text regions corresponding to the image is determined by combining the position information of each candidate text line.
206. And inputting the position information of the text region corresponding to the image and the image requested to be detected into the text recognition model together to obtain the text information in the image.
Further, as a specific implementation of the method shown in fig. 1, an embodiment of the present invention provides an apparatus for recognizing a text in an image, where as shown in fig. 3, the apparatus includes: an acquisition unit 31, a training unit 32, a determination unit 33, and a recognition unit 34.
An acquiring unit 31, which may be configured to acquire a text sample image of the pin-like print font after the scenarization processing;
the training unit 32 may be configured to input the text sample images with the print-like font as training data into network models with different architectures, respectively, for training, so as to obtain a text region detection model and a text recognition model;
a determining unit 33, configured to, when receiving an image text detection request, input the image requested to be detected to the text region identification model, and determine position information of a text region corresponding to the image;
the identifying unit 34 may be configured to input the position information of the text region corresponding to the image and the image requested to be detected to the text recognition model together, so as to obtain text information in the image.
According to the device for recognizing the text in the image, provided by the embodiment of the invention, the text sample image of the needle-like printing font after the scene processing is obtained, and the text sample image after the scene processing is covered with richer picture characteristics, so that the text region detection model and the text recognition model obtained by training have higher scene recognition capability, and therefore, in the process of recognizing the text in the image, text information in different scene images can be realized. Compared with the method for recognizing the text in the image in the prior art, the method has the advantages that the sample data collected in the actual scene is expanded, a large amount of labor cost is not needed to be consumed to collect the sample, the sample collection process is simplified, the labeling time of the sample data is saved, the model trained by the expanded sample data can be used for well fitting the actual scene, and the accuracy of recognizing the text in the image is improved.
As a further description of the device for recognizing a text in an image shown in fig. 3, fig. 4 is a schematic structural diagram of another device for recognizing a text in an image according to an embodiment of the present invention, and as shown in fig. 4, the obtaining unit 31 includes:
a setting module 311, configured to obtain a print sample image generated by using a printing method, and set an attribute value corresponding to the print sample image;
the processing module 312 may be configured to perform a scenization process on the print sample image by changing the attribute value corresponding to the pixel in the print sample image, so as to obtain a text sample image of the needle-like print body.
Further, the processing module 312 includes:
a determining sub-module 3121 configured to determine an optimal threshold for dividing the color attribute values corresponding to the pixels in the print sample image by using a maximum inter-class variance method;
the first processing sub-module 3122 is configured to perform binarization processing on the print sample image by using the optimal threshold as a division basis, so as to obtain background pixels and foreground pixels of the print sample image after binarization processing;
the dividing submodule 3123 may be configured to divide the background pixels of the print sample image after the binarization processing into a plurality of background portions according to a preset ratio;
the second processing sub-module 3124 may be configured to perform scene processing on the print sample image according to the pixel values of the parameters corresponding to each background portion, so as to obtain a text sample image of the needle-like print body.
Further, the determining sub-module 3121 may be specifically configured to divide the color attribute values corresponding to the pixels in the print sample image into two groups by using an assumed gray value, and calculate an inter-class variance, where one group of the color attribute values is taken as the assumed gray value, and another group of the color attribute values is taken as not greater than the assumed gray value;
the determining sub-module 3121 may be further configured to determine, as the optimal threshold of the color attribute value, the assumed gray value when the inter-class variance is maximum by changing the assumed gray value.
Further, the second processing sub-module 3124 may be specifically configured to obtain a text sample image of a quasi-pin print font after a scene with increased contrast by adjusting pixel values of corresponding contrasts of the background portions, so that the text sample image covers scenes with different contrasts;
the second processing sub-module 3124 may be further configured to perform blurring processing on the pixel values of the parameters corresponding to the background portions to obtain a text sample image with increased blurring effect and similar to the pin print font, so that the text sample image covers a scene with blurring effect.
Further, the training unit 33 includes:
the first training module 321 may be configured to label the position information of the text region in the text sample image of the print-like font and input the labeled position information into a first network model for training to obtain a text region detection model;
the second training module 322 may be configured to label text information in a text region in the text sample image of the print-like font and input the labeled text information to the second network model for training, so as to obtain a text recognition model.
Further, the first network model includes a multi-layer structure,
the first training module 321 may be specifically configured to extract, through the convolution layer of the first network model, image region features corresponding to the text sample image of the pin-printing-like font;
the first training module 321 may be further specifically configured to generate, by using a decoding layer of the first network model, a horizontal text sequence feature according to an image region feature corresponding to a text sample image;
the first training module 321 may be further configured to determine a text region in the text sample image according to the horizontal text sequence feature through a prediction layer of the first network model, and process the text region to obtain a candidate text line.
It should be noted that other corresponding descriptions of the functional units related to the device for recognizing a text in an image provided in this embodiment may refer to the corresponding descriptions in fig. 1 and fig. 2, and are not described herein again.
Based on the above methods shown in fig. 1 and fig. 2, correspondingly, the present embodiment further provides a storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the method for recognizing text in the image shown in fig. 1 and fig. 2.
Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the implementation scenarios of the present application.
Based on the method shown in fig. 1 and fig. 2 and the virtual device embodiment shown in fig. 3 and fig. 4, in order to achieve the above object, an embodiment of the present application further provides a computer device, which may specifically be a personal computer, a server, a network device, and the like, where the entity device includes a storage medium and a processor; a storage medium for storing a computer program; a processor for executing a computer program to implement the method for recognizing text in images as shown in fig. 1 and fig. 2.
Optionally, the computer device may also include a user interface, a network interface, a camera, Radio Frequency (RF) circuitry, sensors, audio circuitry, a WI-FI module, and so forth. The user interface may include a Display screen (Display), an input unit such as a keypad (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, etc. The network interface may optionally include a standard wired interface, a wireless interface (e.g., a bluetooth interface, WI-FI interface), etc.
Those skilled in the art will appreciate that the physical device structure of the text recognition device in the image provided in the present embodiment does not constitute a limitation to the physical device, and may include more or less components, or combine some components, or arrange different components.
The storage medium may further include an operating system and a network communication module. The operating system is a program that manages the hardware and software resources of the computer device described above, supporting the operation of information handling programs and other software and/or programs. The network communication module is used for realizing communication among components in the storage medium and other hardware and software in the entity device.
Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus a necessary general hardware platform, and can also be implemented by hardware. Compared with the prior art, the method and the device have the advantages that sample data collected by the actual scene are expanded, a large amount of labor cost is not needed to be consumed to collect the sample, the sample collection process is simplified, the labeling time of the sample data is saved, the model trained by the expanded sample data can be used for well fitting the actual scene, and the accuracy of text recognition in the image is improved.
Those skilled in the art will appreciate that the figures are merely schematic representations of one preferred implementation scenario and that the blocks or flow diagrams in the figures are not necessarily required to practice the present application. Those skilled in the art will appreciate that the modules in the devices in the implementation scenario may be distributed in the devices in the implementation scenario according to the description of the implementation scenario, or may be located in one or more devices different from the present implementation scenario with corresponding changes. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules.
The above application serial numbers are for description purposes only and do not represent the superiority or inferiority of the implementation scenarios. The above disclosure is only a few specific implementation scenarios of the present application, but the present application is not limited thereto, and any variations that can be made by those skilled in the art are intended to fall within the scope of the present application.

Claims (10)

1. A method for recognizing text in an image, the method comprising:
acquiring a text sample image of a needle-like printing font after scene processing;
respectively inputting the character sample images of the needle-like printing fonts serving as training data into network models of different architectures for training to obtain a text region detection model and a text recognition model;
when an image text detection request is received, inputting the image requested to be detected to the text region identification model, and determining the position information of the text region corresponding to the image;
and inputting the position information of the text region corresponding to the image and the image requested to be detected into the text recognition model together to obtain the text information in the image.
2. The method according to claim 1, wherein the acquiring of the text sample image of the pin-like printing font after the scenarization processing specifically comprises:
acquiring a printing sample image generated by using a printing mode, and setting an attribute value corresponding to the printing sample image;
and performing scene processing on the printing sample image by changing the attribute value corresponding to the pixel in the printing sample image to obtain a character sample image of the needle-like printing character body.
3. The method according to claim 2, wherein the obtaining a text sample image of the needle-like printed body by performing the scenization processing on the print sample image by changing the attribute values corresponding to the pixels in the print sample image includes:
determining an optimal threshold value for dividing corresponding color attribute values of pixels in the printing sample image by using a maximum inter-class variance method;
performing binarization processing on the printing sample image by taking the optimal threshold value as a dividing basis to obtain background pixels and foreground pixels of the printing sample image after binarization processing;
dividing background pixels of the print sample image after the binarization processing into a plurality of background parts according to a preset proportion;
and performing scene processing on the printing sample image according to the pixel value of the corresponding parameter of each background part to obtain a character sample image of the needle-like printing character body.
4. The method according to claim 3, wherein the determining an optimal threshold value for dividing the color attribute values corresponding to the pixels in the print sample image by using a maximum inter-class variance method specifically comprises:
dividing color attribute values corresponding to pixels in the printing sample image into two groups by using an assumed gray value, and calculating inter-class variance, wherein one group of color attribute values is taken as the assumed gray value, and the other group of color attribute values is not greater than the assumed gray value;
and determining the assumed gray value at the maximum value of the inter-class variance as the optimal threshold value of the color attribute value by changing the assumed gray value.
5. The method according to claim 3, wherein the performing a scene process on the print sample image for the pixel value of the parameter corresponding to each background portion to obtain a text sample image of the needle-like print body specifically includes:
obtaining a character sample image of a pin-like printing font after a scene with increased contrast by adjusting the pixel value of the corresponding contrast of each background part, so that the character sample image covers scenes with different contrasts;
the fuzzy processing is carried out on the pixel values of the corresponding parameters of each background part, so that the character sample image of the pin-like printing font with the increased fuzzy effect is obtained, and the scene with the fuzzy effect is covered by the character sample image.
6. The method according to any one of claims 1 to 5, wherein the step of inputting the text sample images of the pin-like printing fonts as training data into network models of different architectures for training to obtain a text region detection model and a text recognition model specifically comprises:
marking the position information of the text area in the character sample image of the needle-like printing font, and inputting the marked position information into a first network model for training to obtain a text area detection model;
and labeling the text information in the text area in the character sample image of the needle-like printing font, and inputting the labeled text information into a second network model for training to obtain a text recognition model.
7. The method according to claim 6, wherein the first network model includes a multilayer structure, and the step of inputting the position information of the text region in the text sample image of the needle-like printing font after labeling into the first network model for training to obtain the text region detection model specifically includes:
extracting image area characteristics corresponding to the character sample image of the needle-like printing font through the convolution layer of the first network model;
generating horizontal text sequence characteristics according to image region characteristics corresponding to the character sample images through a decoding layer of the first network model;
and determining a text region in the text sample image according to the horizontal text sequence characteristics through a prediction layer of the first network model, and processing the text region to obtain a candidate text line.
8. An apparatus for recognizing text in an image, the apparatus comprising:
the acquiring unit is used for acquiring a character sample image of the pin-like printing font after the scene processing;
the training unit is used for inputting the character sample images of the needle-like printing fonts serving as training data into network models of different architectures respectively for training to obtain a text region detection model and a text recognition model;
the determining unit is used for inputting the image requested to be detected to the text region identification model when the image text detection request is received, and determining the position information of the text region corresponding to the image;
and the identification unit is used for inputting the position information of the text region corresponding to the image and the image requested to be detected into the text identification model together to obtain the text information in the image.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.
10. A computer storage medium on which a computer program is stored, characterized in that the computer program, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN202010051888.7A 2020-01-17 2020-01-17 Method and device for recognizing text in image, computer equipment and computer storage medium Pending CN111291629A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010051888.7A CN111291629A (en) 2020-01-17 2020-01-17 Method and device for recognizing text in image, computer equipment and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010051888.7A CN111291629A (en) 2020-01-17 2020-01-17 Method and device for recognizing text in image, computer equipment and computer storage medium

Publications (1)

Publication Number Publication Date
CN111291629A true CN111291629A (en) 2020-06-16

Family

ID=71023142

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010051888.7A Pending CN111291629A (en) 2020-01-17 2020-01-17 Method and device for recognizing text in image, computer equipment and computer storage medium

Country Status (1)

Country Link
CN (1) CN111291629A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111798542A (en) * 2020-09-10 2020-10-20 北京易真学思教育科技有限公司 Model training method, data processing device, model training apparatus, and storage medium
CN111950356A (en) * 2020-06-30 2020-11-17 深圳市雄帝科技股份有限公司 Seal text positioning method and device and electronic equipment
CN112163577A (en) * 2020-09-22 2021-01-01 广州博冠信息科技有限公司 Character recognition method and device in game picture, electronic equipment and storage medium
CN112232340A (en) * 2020-10-15 2021-01-15 马婧 Method and device for identifying printed information on surface of object
CN112287969A (en) * 2020-09-25 2021-01-29 浪潮金融信息技术有限公司 Character sample collecting and processing method, self-service terminal equipment and independent module
CN112418206A (en) * 2020-11-20 2021-02-26 平安普惠企业管理有限公司 Picture classification method based on position detection model and related equipment thereof
CN112541443A (en) * 2020-12-16 2021-03-23 平安科技(深圳)有限公司 Invoice information extraction method and device, computer equipment and storage medium
CN112580495A (en) * 2020-12-16 2021-03-30 上海眼控科技股份有限公司 Text recognition method and device, computer equipment and storage medium
CN112966841A (en) * 2021-03-18 2021-06-15 深圳闪回科技有限公司 Offline automatic order examining system
CN112989786A (en) * 2021-01-18 2021-06-18 平安国际智慧城市科技股份有限公司 Document analysis method, system, device and storage medium based on image recognition
CN112990212A (en) * 2021-02-05 2021-06-18 开放智能机器(上海)有限公司 Reading method and device of thermal imaging temperature map, electronic equipment and storage medium
CN113298001A (en) * 2021-06-02 2021-08-24 上海大学 System and method for identifying and recommending shops along street based on vehicle-mounted camera shooting
CN113743438A (en) * 2020-08-20 2021-12-03 北京沃东天骏信息技术有限公司 Method, device and system for generating data set for text detection
CN114612915A (en) * 2022-05-12 2022-06-10 青岛美迪康数字工程有限公司 Method and device for extracting patient information of film image
CN115205164A (en) * 2022-09-15 2022-10-18 腾讯科技(深圳)有限公司 Training method of image processing model, video processing method, device and equipment

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111950356A (en) * 2020-06-30 2020-11-17 深圳市雄帝科技股份有限公司 Seal text positioning method and device and electronic equipment
CN111950356B (en) * 2020-06-30 2024-04-19 深圳市雄帝科技股份有限公司 Seal text positioning method and device and electronic equipment
CN113743438A (en) * 2020-08-20 2021-12-03 北京沃东天骏信息技术有限公司 Method, device and system for generating data set for text detection
CN111798542B (en) * 2020-09-10 2020-12-22 北京易真学思教育科技有限公司 Model training method, data processing device, model training apparatus, and storage medium
CN111798542A (en) * 2020-09-10 2020-10-20 北京易真学思教育科技有限公司 Model training method, data processing device, model training apparatus, and storage medium
CN112163577A (en) * 2020-09-22 2021-01-01 广州博冠信息科技有限公司 Character recognition method and device in game picture, electronic equipment and storage medium
CN112163577B (en) * 2020-09-22 2022-10-11 广州博冠信息科技有限公司 Character recognition method and device in game picture, electronic equipment and storage medium
CN112287969A (en) * 2020-09-25 2021-01-29 浪潮金融信息技术有限公司 Character sample collecting and processing method, self-service terminal equipment and independent module
CN112232340A (en) * 2020-10-15 2021-01-15 马婧 Method and device for identifying printed information on surface of object
CN112418206B (en) * 2020-11-20 2024-02-27 上海昇晔网络科技有限公司 Picture classification method based on position detection model and related equipment thereof
CN112418206A (en) * 2020-11-20 2021-02-26 平安普惠企业管理有限公司 Picture classification method based on position detection model and related equipment thereof
CN112541443B (en) * 2020-12-16 2024-05-10 平安科技(深圳)有限公司 Invoice information extraction method, invoice information extraction device, computer equipment and storage medium
CN112580495A (en) * 2020-12-16 2021-03-30 上海眼控科技股份有限公司 Text recognition method and device, computer equipment and storage medium
CN112541443A (en) * 2020-12-16 2021-03-23 平安科技(深圳)有限公司 Invoice information extraction method and device, computer equipment and storage medium
CN112989786B (en) * 2021-01-18 2023-08-18 平安国际智慧城市科技股份有限公司 Document analysis method, system, device and storage medium based on image recognition
CN112989786A (en) * 2021-01-18 2021-06-18 平安国际智慧城市科技股份有限公司 Document analysis method, system, device and storage medium based on image recognition
CN112990212A (en) * 2021-02-05 2021-06-18 开放智能机器(上海)有限公司 Reading method and device of thermal imaging temperature map, electronic equipment and storage medium
CN112966841A (en) * 2021-03-18 2021-06-15 深圳闪回科技有限公司 Offline automatic order examining system
CN113298001A (en) * 2021-06-02 2021-08-24 上海大学 System and method for identifying and recommending shops along street based on vehicle-mounted camera shooting
CN114612915A (en) * 2022-05-12 2022-06-10 青岛美迪康数字工程有限公司 Method and device for extracting patient information of film image
CN115205164A (en) * 2022-09-15 2022-10-18 腾讯科技(深圳)有限公司 Training method of image processing model, video processing method, device and equipment

Similar Documents

Publication Publication Date Title
CN111291629A (en) Method and device for recognizing text in image, computer equipment and computer storage medium
CN110232311B (en) Method and device for segmenting hand image and computer equipment
US20190180154A1 (en) Text recognition using artificial intelligence
Zhang et al. Ensnet: Ensconce text in the wild
CN110647829A (en) Bill text recognition method and system
CN107403130A (en) A kind of character identifying method and character recognition device
RU2721187C1 (en) Teaching language models using text corpuses containing realistic errors of optical character recognition (ocr)
CN111414906A (en) Data synthesis and text recognition method for paper bill picture
US20230206487A1 (en) Detection and identification of objects in images
He et al. Historical manuscript dating based on temporal pattern codebook
CN107368827A (en) Character identifying method and device, user equipment, server
CN110443235B (en) Intelligent paper test paper total score identification method and system
CN113158977B (en) Image character editing method for improving FANnet generation network
CN109598185A (en) Image recognition interpretation method, device, equipment and readable storage medium storing program for executing
CN112348028A (en) Scene text detection method, correction method, device, electronic equipment and medium
CN113436222A (en) Image processing method, image processing apparatus, electronic device, and storage medium
CN113537189A (en) Handwritten character recognition method, device, equipment and storage medium
CN116030453A (en) Digital ammeter identification method, device and equipment
CN104915641B (en) The method that facial image light source orientation is obtained based on Android platform
CN114882204A (en) Automatic ship name recognition method
CN110796145A (en) Multi-certificate segmentation association method based on intelligent decision and related equipment
CN111311602A (en) Lip image segmentation device and method for traditional Chinese medicine facial diagnosis
CN113780116A (en) Invoice classification method and device, computer equipment and storage medium
De Nardin et al. Few-shot pixel-precise document layout segmentation via dynamic instance generation and local thresholding
CN110633666A (en) Gesture track recognition method based on finger color patches

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20220531

Address after: 518000 China Aviation Center 2901, No. 1018, Huafu Road, Huahang community, Huaqiang North Street, Futian District, Shenzhen, Guangdong Province

Applicant after: Shenzhen Ping An medical and Health Technology Service Co.,Ltd.

Address before: Room 12G, Area H, 666 Beijing East Road, Huangpu District, Shanghai 200001

Applicant before: PING AN MEDICAL AND HEALTHCARE MANAGEMENT Co.,Ltd.

TA01 Transfer of patent application right