CN114495132A - Character recognition method, device, equipment and storage medium - Google Patents

Character recognition method, device, equipment and storage medium Download PDF

Info

Publication number
CN114495132A
CN114495132A CN202111535285.5A CN202111535285A CN114495132A CN 114495132 A CN114495132 A CN 114495132A CN 202111535285 A CN202111535285 A CN 202111535285A CN 114495132 A CN114495132 A CN 114495132A
Authority
CN
China
Prior art keywords
image
text
processed
different scales
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111535285.5A
Other languages
Chinese (zh)
Inventor
文玉茹
卢道和
杨军
程志峰
李勋棋
罗海湾
何勇彬
陈鉴镔
胡仲臣
陈刚
周佳振
朱嘉伟
郭英亚
李兴龙
周琪
熊思清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN202111535285.5A priority Critical patent/CN114495132A/en
Publication of CN114495132A publication Critical patent/CN114495132A/en
Priority to PCT/CN2022/102163 priority patent/WO2023109086A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Character Discrimination (AREA)
  • Character Input (AREA)

Abstract

The method comprises the steps of obtaining an image to be processed, wherein the image to be processed carries one or more characters, further extracting the characteristics of the image to be processed to obtain image characteristics, obtaining character frames with different scales in the image to be processed according to the image characteristics, performing character frame regression processing on the character frames with different scales, solving the problem of image deformation or angle movement, determining the positions of the characters in the image to be processed according to the character frames with different scales after the character frame regression processing, and performing character recognition on the image to be processed based on the positions, so that the character recognition rate is improved, and a better character recognition effect is achieved.

Description

Character recognition method, device, equipment and storage medium
Technical Field
The present application relates to image recognition technology of financial technology (Fintech), and in particular, to a method, an apparatus, a device, and a storage medium for character recognition.
Background
With the development of information technology, more and more technologies are applied in the financial field, the traditional financial industry is gradually changing to financial technology, the image recognition technology is no exception, but higher requirements are also put on the image recognition technology due to the requirements of the financial industry on safety and real-time performance.
In the related art, the image recognition technology mainly refers to processing a captured system front-end picture according to a set target by using a computer, and in the field of artificial intelligence, a neural network is the most widely applied in the field of image recognition. The neural network model can realize the functions of face recognition, image detection, image classification, target tracking, character recognition and the like. Among them, the functions of face recognition, image classification, character recognition, etc. have been developed for a long time to achieve a better recognition effect.
Character recognition generally refers to a technique for automatically recognizing characters using various devices including computers, and has an important application in many fields of today's society. However, after the image is deformed or angularly moved, the conventional image recognition technology does not have the equal variation property, so that the character recognition rate is reduced, and an ideal recognition effect cannot be achieved.
Disclosure of Invention
In order to solve the problems in the prior art, the application provides a character recognition method, a device, equipment and a storage medium.
In a first aspect, an embodiment of the present application provides a text recognition method, where the text recognition method includes:
acquiring an image to be processed, wherein the image to be processed carries one or more characters;
extracting the features of the image to be processed to obtain the image features corresponding to the image to be processed;
obtaining a plurality of text frames with different scales in the image to be processed according to the image characteristics, and performing text frame regression processing on the text frames with different scales;
and determining the positions of the one or more characters in the image to be processed according to a plurality of character frames with different scales after the character frame regression processing, and performing character recognition on the image to be processed based on the positions of the one or more characters.
In a possible implementation manner, the performing feature extraction on the image to be processed to obtain an image feature corresponding to the image to be processed includes:
and performing feature extraction on the image to be processed based on a dense connection network to obtain the image features corresponding to the image to be processed, wherein the dense connection network comprises one or more dense blocks, any two dense blocks in the dense connection network are directly connected, and the input of each dense block is the union of the outputs of all the dense blocks.
In one possible implementation, the dense connection network further comprises one or more transitional connection layers, the transitional connection layers comprise 1 × 1 convolutional layers, and the input of each transitional connection layer is the union of all the previous dense blocks and the output of the transitional connection layer;
the performing feature extraction on the image to be processed based on the dense connection network to obtain the image features corresponding to the image to be processed includes:
and performing feature extraction on the image to be processed based on the one or more dense blocks and the one or more transitional connection layers to obtain the image features corresponding to the image to be processed.
In a possible implementation manner, the obtaining, according to the image feature, a plurality of text frames with different scales in the image to be processed, and performing text frame regression processing on the text frames with different scales includes:
obtaining a plurality of text frames with different scales in the image to be processed according to the image characteristics, and determining offset data of the text frames with different scales;
performing text box regression processing on the text boxes with different scales based on the offset data.
In a possible implementation manner, the obtaining, according to the image feature, a plurality of text boxes of different scales in the image to be processed and determining offset data of the text boxes of the different scales includes:
carrying out down-sampling processing on the image features, and carrying out down-sampling and convolution processing on the image features after the down-sampling processing;
and taking the image features subjected to downsampling and convolution processing as new image features subjected to downsampling processing, and re-executing the steps of downsampling and convolution processing on the image features subjected to downsampling processing until the character frames with different scales in the image to be processed are obtained, and determining offset data of the character frames with different scales.
In a possible implementation manner, the determining, according to a plurality of text boxes with different scales after the text box regression processing, the position of the one or more texts in the image to be processed includes:
obtaining scores of the text frames with different scales after the text frame regression processing according to the text frames with different scales after the text frame regression processing and a preset score model, wherein the preset score model is used for determining the scores of the text frames with different scales according to the ratio of the intersection and union of the text frame with the highest score and the text frames with different scales in the text frames with different scales;
and calculating the positions of the text frames with different scales after the text frame regression processing according to the scores of the text frames with different scales after the text frame regression processing, and determining the positions of the one or more characters in the image to be processed based on the positions of the text frames with different scales after the text frame regression processing.
In a possible implementation manner, the calculating, according to the scores of the text boxes of the plurality of different scales after the text box regression processing, the positions of the text boxes of the plurality of different scales after the text box regression processing includes:
calculating the ratio of the intersection and union of a text frame with the highest score in the text frames with different scales subjected to the text frame regression processing and a text frame i subjected to the text frame regression processing, wherein the text frame i subjected to the text frame regression processing is any one of the text frames with different scales subjected to the text frame regression processing, i is 1, …, n is an integer, and n is determined according to the number of the text frames with different scales subjected to the text frame regression processing;
and if the calculated ratio is smaller than a preset threshold value, calculating the position of the text box i subjected to the text box regression processing according to the score of the text box i subjected to the text box regression processing.
In a possible implementation manner, before the performing the feature extraction on the image to be processed to obtain the image feature corresponding to the image to be processed, the method further includes:
performing parameter reduction processing on the image to be processed;
the feature extraction of the image to be processed to obtain the image features corresponding to the image to be processed includes:
and performing feature extraction on the image to be processed after parameter reduction processing to obtain image features corresponding to the image to be processed.
In a possible implementation manner, the performing parameter reduction processing on the image to be processed includes:
and performing parameter reduction processing on the image to be processed by utilizing 3 × 3 convolutional layers and 12 × 2 pooling layer, wherein the 3 × 3 convolutional layers are connected with the 2 × 2 pooling layer in sequence.
In a possible implementation manner, the performing text recognition on the image to be processed based on the position of the one or more texts includes:
and identifying the characters in the image to be processed based on the positions of the one or more characters and a preset identification model, wherein the preset identification model is used for identifying the characters in the image according to the positions of the characters in the image.
In a second aspect, an embodiment of the present application provides a text recognition apparatus, where the apparatus includes:
the image acquisition module is used for acquiring an image to be processed, and the image to be processed carries one or more characters;
the characteristic extraction module is used for extracting the characteristics of the image to be processed to obtain the image characteristics corresponding to the image to be processed;
the text frame processing module is used for obtaining a plurality of text frames with different scales in the image to be processed according to the image characteristics and performing text frame regression processing on the text frames with different scales;
and the character recognition module is used for determining the positions of the one or more characters in the image to be processed according to a plurality of character frames with different scales after the character frame regression processing, and performing character recognition on the image to be processed based on the positions of the one or more characters.
In a possible implementation manner, the feature extraction module is specifically configured to:
and performing feature extraction on the image to be processed based on a dense connection network to obtain the image features corresponding to the image to be processed, wherein the dense connection network comprises one or more dense blocks, any two dense blocks in the dense connection network are directly connected, and the input of each dense block is the union of the outputs of all the dense blocks.
In one possible implementation, the dense connection network further comprises one or more transitional connection layers, the transitional connection layers comprising 1 × 1 convolutional layers, and the input of each transitional connection layer is the union of all the previous dense blocks and the output of the transitional connection layer.
The feature extraction module is specifically configured to:
and performing feature extraction on the image to be processed based on the one or more dense blocks and the one or more transitional connection layers to obtain the image features corresponding to the image to be processed.
In a possible implementation manner, the text box processing module is specifically configured to:
obtaining a plurality of text frames with different scales in the image to be processed according to the image characteristics, and determining offset data of the text frames with different scales;
performing text box regression processing on the text boxes with different scales based on the offset data.
In a possible implementation manner, the text box processing module is specifically configured to:
carrying out down-sampling processing on the image features, and carrying out down-sampling and convolution processing on the image features after the down-sampling processing;
and taking the image features subjected to downsampling and convolution processing as new image features subjected to downsampling processing, and re-executing the steps of downsampling and convolution processing on the image features subjected to downsampling processing until the character frames with different scales in the image to be processed are obtained, and determining offset data of the character frames with different scales.
In a possible implementation manner, the text recognition module is specifically configured to:
obtaining scores of the text frames with different scales after the text frame regression processing according to the text frames with different scales after the text frame regression processing and a preset score model, wherein the preset score model is used for determining the scores of the text frames with different scales according to the ratio of the intersection and union of the text frame with the highest score and the text frames with different scales in the text frames with different scales;
and calculating the positions of the text frames with different scales after the text frame regression processing according to the scores of the text frames with different scales after the text frame regression processing, and determining the positions of the one or more characters in the image to be processed based on the positions of the text frames with different scales after the text frame regression processing.
In a possible implementation manner, the text recognition module is specifically configured to:
calculating the ratio of the intersection and union of a text frame with the highest score in the text frames with different scales subjected to the text frame regression processing and a text frame i subjected to the text frame regression processing, wherein the text frame i subjected to the text frame regression processing is any one of the text frames with different scales subjected to the text frame regression processing, i is 1, …, n is an integer, and n is determined according to the number of the text frames with different scales subjected to the text frame regression processing;
and if the calculated ratio is smaller than a preset threshold value, calculating the position of the text box i subjected to the text box regression processing according to the score of the text box i subjected to the text box regression processing.
In a possible implementation manner, the feature extraction module is specifically configured to:
performing parameter reduction processing on the image to be processed;
and performing feature extraction on the image to be processed after parameter reduction processing to obtain image features corresponding to the image to be processed.
In a possible implementation manner, the feature extraction module is specifically configured to:
and performing parameter reduction processing on the image to be processed by utilizing 3 × 3 convolutional layers and 12 × 2 pooling layer, wherein the 3 × 3 convolutional layers are connected with the 2 × 2 pooling layer in sequence.
In a possible implementation manner, the text recognition module is specifically configured to:
and identifying the characters in the image to be processed based on the positions of the one or more characters and a preset identification model, wherein the preset identification model is used for identifying the characters in the image according to the positions of the characters in the image.
In a third aspect, an embodiment of the present application provides a text recognition apparatus, including:
a processor;
a memory; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor, the computer program comprising instructions for performing the method of the first aspect.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored, and the computer program causes a server to execute the method in the first aspect.
In a fifth aspect, the present application provides a computer program product, which includes computer instructions for executing the method of the first aspect by a processor.
According to the character recognition method, the character recognition device, the character recognition equipment and the storage medium, the image to be processed is obtained, the image to be processed carries one or more characters, the image to be processed is subjected to feature extraction to obtain image features, so that character frames with different scales in the image to be processed are obtained according to the image features, the character frame regression processing is carried out on the character frames with different scales, the problem that the image is deformed or angularly moved is solved, then the positions of the characters in the image to be processed are determined according to the character frames with different scales after the character frame regression processing, and the characters are recognized on the image to be processed based on the positions, so that the character recognition rate is improved, and a better character recognition effect is achieved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is a schematic diagram of a text recognition system according to an embodiment of the present application;
fig. 2 is a schematic flow chart of a text recognition method according to an embodiment of the present application;
fig. 3 is a schematic flow chart of another text recognition method according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a downsampling and convolution process provided by an embodiment of the present application;
FIG. 5 is a schematic diagram illustrating an offset of a text box according to an embodiment of the present application;
fig. 6 is a schematic flowchart of another character recognition method according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a character recognition device according to an embodiment of the present application;
fig. 8 shows a schematic diagram of a possible structure of the text recognition device of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first," "second," "third," and "fourth," if any, in the description and claims of this application and the above-described figures are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The existing character recognition has relatively extensive research on computer image and vision at present, has extremely high application requirements in scenes such as license plate recognition, bill recognition, book text recognition and the like, has mature technologies and has relatively good effect. However, if the image is deformed or angularly moved, the conventional image recognition technology does not have the equal deformation property, so that the character recognition rate is reduced, and an ideal recognition effect cannot be achieved.
Therefore, an embodiment of the present application provides a text recognition method, where after an image to be processed carrying one or more texts is obtained, feature extraction is performed on the image to be processed to obtain image features, and then, according to the image features, a plurality of text frames with different scales in the image to be processed are obtained, and text frame regression processing is performed on the text frames with different scales, so as to solve the problem of image deformation or angle movement, improve the subsequent text frames with different scales based on the text frame regression processing, and achieve a better text recognition effect by performing text recognition on the image to be processed.
Optionally, the text recognition method provided by the present application may be applied to the schematic architecture of the text recognition system shown in fig. 1, and as shown in fig. 1, the system may include a receiving device 101, a processing device 102, and a display device 103.
In a specific implementation process, the receiving device 101 may be an input/output interface, and may also be a communication interface, and may be configured to receive an image to be processed that carries one or more characters.
The processing device 102 may obtain the image to be processed through the receiving device 101, further perform feature extraction on the image to be processed to obtain image features, thereby obtaining a plurality of text frames with different scales in the image to be processed according to the image features, performing text frame regression processing on the text frames with different scales, solving the problem of image deformation or angle movement, and then performing text recognition on the image to be processed according to the text frames with different scales after the text frame regression processing, thereby improving a text recognition rate and achieving a better text recognition effect.
In addition, the display device 103 may be used to display the image to be processed and a plurality of text boxes and the like with different scales.
The display device may also be a touch display screen for receiving user instructions while displaying the above-mentioned content to enable interaction with a user.
The processing device 102 may also send the result of character recognition on the image to be processed to a decoder, and the decoder decodes the result and outputs corresponding characters.
It should be understood that the processing device may be implemented by a processor reading instructions in a memory and executing the instructions, or may be implemented by a chip circuit.
The system is only an exemplary system, and when the system is implemented, the system can be set according to application requirements.
In addition, the system architecture described in the embodiment of the present application is for more clearly illustrating the technical solution of the embodiment of the present application, and does not form a limitation on the technical solution provided in the embodiment of the present application, and it can be known by a person skilled in the art that the technical solution provided in the embodiment of the present application is also applicable to similar technical problems along with the evolution of the system architecture and the appearance of new service scenarios.
The technical solutions of the present application are described below with several embodiments as examples, and the same or similar concepts or processes may not be described in detail in some embodiments.
Fig. 2 is a schematic flow chart of a character recognition method according to an embodiment of the present application, where an execution subject of the embodiment may be a processing device in the embodiment shown in fig. 1, and may be determined specifically according to an actual situation. As shown in fig. 2, the character recognition method provided in the embodiment of the present application includes the following steps:
s201: acquiring an image to be processed, wherein the image to be processed carries one or more characters.
The image to be processed can be set according to actual conditions, such as images obtained in scenes of license plate recognition, bill recognition, book text recognition and the like.
S202: and performing feature extraction on the image to be processed to obtain image features corresponding to the image to be processed.
Before the processing device extracts the features of the image to be processed, the processing device may further perform parameter reduction processing on the image to be processed, so as to reduce parameters and calculation amount and improve efficiency of subsequent character recognition.
For example, the processing device may perform parameter reduction processing on the image to be processed by using 3 × 3 convolutional layers and 12 × 2 pooling layer, where the 3 × 3 convolutional layers are sequentially connected and then connected to the 2 × 2 pooling layer. The convolution kernel size (kernel _ size), convolution step size (stride), and feature map padding width (padding) of the 3 convolutional layers and 1 pooling layer of 2 × 2 may be as shown in table 1:
TABLE 1
Figure BDA0003412955030000091
In addition, when the processing device performs feature extraction on the image to be processed, the processing device may perform feature extraction on the image to be processed based on a dense connection network to obtain image features corresponding to the image to be processed, where the dense connection network includes one or more dense blocks, any two dense blocks in the dense connection network are directly connected, and an input of each dense block is a union of outputs of all previous dense blocks.
Here, the processing device uses the dense connection network as a feature extraction network, and the network can take the outputs of all previous layers as the inputs of the current layer, so that the gradient and information propagation are more accurate, and the accuracy of performing subsequent character recognition based on the features of the image to be processed extracted by the dense connection network is higher.
In the embodiment of the present application, in order to increase the depth of extracting features, the dense connection network may further include one or more transition connection layers, where the transition connection layers are used to increase the number of dense blocks in the dense connection network, and in the case of increasing the number, the resolution of the original feature map is not changed. The transition connection layer comprises 1 multiplied by 1 convolution layers, so that the depth of the extraction features of the dense connection network can be increased, the whole quantity limitation of the dense blocks can be eliminated, and the input of each transition connection layer is the union of all the dense blocks and the output of the transition connection layer. The processing device can extract the features of the image to be processed based on the one or more dense blocks and the one or more transitional connection layers, so that the extracted features are richer, and the accuracy of character recognition based on the extracted features is improved.
TABLE 2
Figure BDA0003412955030000101
For example, the number of the dense blocks and the transitional connection layers may be set according to actual situations, for example, as shown in table 2, the number of the dense blocks is 4, the number of the transitional connection layers is 2, the 1 st transitional connection layer is disposed between the 3 rd dense block and the 4 th dense block, and the 2 nd transitional connection layer is disposed behind the 4 th dense block. The kernel _ size, stride, and padding parameters for 4 dense blocks and 2 transitional connection layers shown in table 2.
S203: and obtaining a plurality of character frames with different scales in the image to be processed according to the image characteristics, and performing character frame regression processing on the character frames with different scales.
Here, the processing device may obtain a plurality of text frames with different scales in the image to be processed according to the image feature by using a preset dense layer, and perform text frame regression processing on the plurality of text frames with different scales.
The preset dense layer may include two blocks, one block is used for obtaining a plurality of text frames with different scales in the image to be processed, and the other block is used for performing text frame regression processing on the plurality of text frames with different scales.
In the embodiment of the present application, the processing device performs text frame regression processing on a plurality of text frames with different scales in the image to be processed, so as to solve the problem of image deformation or angle movement, improve the subsequent recognition rate of performing text recognition on the image to be processed based on the plurality of text frames with different scales after the text frame regression processing.
S204: and determining the positions of the one or more characters in the image to be processed according to a plurality of character frames with different scales after the regression processing of the character frames, and performing character recognition on the image to be processed based on the positions of the one or more characters.
For example, the processing device may obtain scores of the text frames with different scales after the text frame regression processing according to the text frames with different scales after the text frame regression processing and a preset score model, further calculate positions of the text frames with different scales after the text frame regression processing according to the scores, and determine the position of the one or more characters in the image to be processed based on the positions.
The preset score model is used for determining the scores of the text boxes with different scales according to the ratio of the intersection and union of the text box with the highest score among the text boxes with different scales and the text boxes with different scales.
For example, the preset score model includes the expression:
Figure BDA0003412955030000111
wherein s isiThe score of the ith text box is expressed, and iou represents Intersection over Union, which is the ratio of the Intersection and Union of the text box and other text boxes. T represents the calculated text box of the highest score, ciWhich represents a candidate box, N represents a threshold value, which can be set according to practical situations. Here, the processing device may set a plurality of frames of different sizes after the frame regression processing as the candidate frames, calculate scores of all the candidate frames to obtain a highest-score frame T, and obtain scores of the plurality of frames of different sizes after the frame regression processing based on the expression.
Further, the processing device may calculate the positions of the plurality of character frames of different scales after the character frame regression processing based on the score by using an expression:
Figure BDA0003412955030000121
wherein t' represents the positions of the text boxes with different scales after the text box regression processing, and tiCoordinates representing the ith text box.
In addition, when the processing device calculates the positions of the plurality of frames with different scales after the frame regression processing based on the scores, it may be considered to calculate a ratio of an intersection and a union of a frame with a highest score among the plurality of frames with different scales after the frame regression processing and a frame i after the frame regression processing. If the calculated ratio is smaller than the preset threshold, the processing device may calculate the position of the text box i after the text box regression processing according to the score of the text box i after the text box regression processing. The character frame i after the character frame regression processing is any one of a plurality of character frames with different scales after the character frame regression processing, i is 1, …, n is an integer, and n is determined according to the number of the character frames with different scales after the character frame regression processing. That is, the processing device may use a Non Maximum Suppression (NMS) algorithm to calculate the positions of the text boxes with different sizes after the text box regression processing, so that the calculation result is more accurate.
For example, the processing device may list all candidate frames a, that is, list a plurality of frames with different scales after the text frame regression processing, and the calculated score siAnd initializes a detection set Bi to be null. Then, the processing device may calculate all the text frames in the candidate frames a to obtain the text frame T with the highest score, and put the text frame T into the set Bi, where i represents the frame selection of the ith time. Further, the processing means may set a threshold N, then traverse all the remaining text boxes, calculate iou of the text box and the top score detection box, and put it into the set Bi if the result is greater than or equal to the threshold. The above processing device repeats the above operations until a is empty, resulting in a collection set Bi. Finally, the processing means may be based on the score s for each text boxiCalculating the bits of the text boxTherefore, the position of the text box calculated based on the position is more accurate.
In this embodiment of the application, when the processing device performs character recognition on the image to be processed based on the position of the one or more characters, the processing device may further recognize the characters in the image to be processed based on the position of the one or more characters and a preset recognition model.
The preset recognition model is used for recognizing characters in the image according to the positions of the characters in the image.
According to the method and the device, the image to be processed is obtained, the image to be processed carries one or more characters, further, feature extraction is carried out on the image to be processed, image features are obtained, accordingly, character frames of multiple different scales in the image to be processed are obtained according to the image features, character frame regression processing is carried out on the character frames of the multiple different scales, the problem that the image is deformed or the angle of the image is moved is solved, then, the positions of the characters in the image to be processed are determined according to the character frames of the multiple different scales after the character frame regression processing, character recognition is carried out on the image to be processed based on the positions, character recognition rate is improved, and a good character recognition effect is achieved. In addition, the embodiment of the application also performs parameter reduction processing on the image to be processed, so that the parameters and the calculated amount are reduced, and the efficiency of subsequent character recognition is improved. In addition, the dense connection network is used as the feature extraction network, the network can take the outputs of all previous layers as the input of the current layer, gradient and information propagation are more accurate, and therefore the accuracy of subsequent character recognition based on the features of the to-be-processed image extracted by the dense connection network is higher. The embodiment of the application can also adopt an NMS algorithm to calculate the positions of the text boxes with different scales after the text box regression processing, so that the calculation result is more accurate.
Here, before the processing device identifies the characters in the image to be processed based on the positions of the one or more characters and a preset identification model, the processing device needs to train the preset identification model so as to identify the characters in the image to be processed by using the model. In the training process, the processing device can input an image carrying characters into the preset recognition model, wherein the input image also carries positions of the characters in the image, and then, the output accuracy is determined according to the characters output by the preset recognition model and the characters corresponding to the input image. If the output accuracy is lower than the preset accuracy threshold, the processing device may adjust the preset recognition model according to the output accuracy to improve the output accuracy, use the adjusted preset recognition as a new preset recognition model, and re-execute the step of inputting the image with the text into the preset recognition model.
In addition, when the processing device obtains a plurality of character frames with different scales in the image to be processed according to the image characteristics and performs character frame regression processing on the character frames with different scales, the processing device also considers that the character frames with different scales in the image to be processed are obtained according to the image characteristics, offset data of the character frames with different scales are determined, then the character frame regression processing is performed on the character frames with different scales based on the offset data, the problem of image deformation or angle movement is solved, and then character recognition is performed on the image to be processed according to the character frames with different scales after the character frame regression processing, and the character recognition rate is improved. Fig. 3 is a flowchart illustrating another text recognition method according to an embodiment of the present application. As shown in fig. 3, the method includes:
s301: acquiring an image to be processed, wherein the image to be processed carries one or more characters.
S302: and performing feature extraction on the image to be processed to obtain image features corresponding to the image to be processed.
The steps S301 to S302 are the same as the steps S201 to S202, and are not described herein again.
S303: and obtaining a plurality of character frames with different scales in the image to be processed according to the image characteristics, and determining offset data of the character frames with different scales.
Here, the processing device may perform downsampling processing on the image feature, perform downsampling and convolution processing on the image feature after the downsampling processing, and perform the step of performing the downsampling and convolution processing on the image feature after the downsampling processing again by using the image feature after the downsampling and convolution processing as a new image feature after the downsampling processing until the text frames of the plurality of different scales in the image to be processed are obtained, and determine offset data of the text frames of the plurality of different scales.
The processing device may perform downsampling on the image features by using a downsampling module, and the downsampling module may include a convolution of 1 × 1 and a pooling layer of 2 × 2. Here, the processing apparatus uses a 2 × 2 pooling layer to enable size matching of the feature map, uses a 1 × 1 convolution to reduce the number of channels by half, and the scale of the whole module includes the features of the feature map and the features of the previous feature map, so that the parameters are fewer and the result is more accurate.
In addition, the processing device may further perform convolution processing on the image feature by using a convolution module, where the convolution module may include a convolution layer of 1 × 1 and a convolution layer of 3 × 3, and perform two convolution operations, where the feature map of the previous layer is transmitted to the feature map of the next layer.
In the embodiment of the present application, the processing device can obtain text boxes with 6 different dimensions as an example. As shown in fig. 4, the text boxes of 6 different scales include text boxes of scale 1, scale 2, scale 3, scale 4, scale 5 and scale 6. The processing device determines a frame of scale 1 according to the image characteristics, further performs downsampling processing on the frame of scale 1 to obtain a frame of scale 2, performs downsampling and convolution processing on the frame of scale 2 to obtain a frame of scale 3, and repeatedly executes the steps of downsampling and convolution processing on the frame of scale 3 to obtain a frame of scale 4, downsampling and convolution processing on the frame of scale 4 to obtain a frame of scale 5, and downsampling and convolution processing on the frame of scale 5 to obtain a frame of scale 6.
The processing device determines offset data of the plurality of frames with different dimensions in the processing procedure, and performs frame regression processing on the plurality of frames with different dimensions based on the offset data. For better understanding of the offset of the text box, fig. 5 shows an offset diagram of a text box, in which b0Indicates the default frame, 4 arrows from b0Lead-out direction GqA regression learning process, G, from the default box to the actual text box is shownbRepresenting the actual target GqIs determined by the minimum bounding matrix of (a),
Figure BDA0003412955030000151
the true value of the rectangle, which is the smallest bounding rectangle of G,
Figure BDA0003412955030000152
represents GbIs measured at a central point of the beam,
Figure BDA0003412955030000153
the indication of the width is that the width,
Figure BDA0003412955030000154
indicating a high.
Here, the processing device determines offset data of a character frame, and then performs character frame regression processing on the character frame based on the offset data, thereby solving the problem of image deformation or angular movement and improving the accuracy of subsequent character recognition.
S304: and performing text-box regression processing on the text boxes with different scales based on the offset data.
S305: and determining the positions of the one or more characters in the image to be processed according to a plurality of character frames with different scales after the character frame regression processing, and performing character recognition on the image to be processed based on the positions of the one or more characters.
Step S305 is the same as the implementation of step S204, and is not described herein again.
According to the text box regression processing method and device, after the offset data of the text box are determined, the text box regression processing is carried out on the text box based on the offset data, the problem that an image is deformed or an image moves in an angle mode is solved, then text recognition is carried out according to the text boxes with different scales after the text box regression processing, and the text recognition rate is improved.
Here, fig. 6 is a schematic flow chart of another character recognition method proposed in an embodiment of the present application, in which after acquiring a to-be-processed image carrying one or more characters, the processing device may perform parameter reduction processing on the to-be-processed image. Specifically, the processing apparatus may perform parameter reduction by using a parameter reduction module, where the parameter reduction module may include a 3 × 3 convolutional layer and 12 × 2 pooling layer, and the 3 × 3 convolutional layers are sequentially connected and then connected to the 2 × 2 pooling layer. Further, the processing device may perform feature extraction on the to-be-processed image after parameter reduction processing, and may perform feature extraction, for example, based on a dense connection network. The dense connection network may include one or more dense blocks and may further include one or more transitional connection layers. Here, the figure shows 4 dense blocks and 2 transitional connection layers, wherein the 1 st transitional connection layer is arranged between the 3 rd dense block and the 4 th dense block, and the 2 nd transitional connection layer is arranged behind the 4 th dense block. After the feature extraction, the processing device may obtain a plurality of text frames with different scales in the image to be processed based on the extracted image features, and determine offset data of the plurality of text frames with different scales, so as to perform text frame regression processing on the plurality of text frames with different scales based on the offset data. Here, the processing device may perform the processing by using a preset dense layer, where the preset dense layer may include two blocks, one block is used to obtain a plurality of text frames with different scales in the image to be processed, and the other block is used to perform text frame regression processing on the text frames with different scales. Finally, the processing device determines the position of one or more characters in the image to be processed according to a plurality of character frames with different scales after the regression processing of the character frames, and performs character recognition on the image to be processed based on the position. The processing device can adopt an NMS algorithm to calculate the positions of the text boxes with different scales after the text box regression processing, so that the calculation result is more accurate.
In addition, the processing device can also send the result of character recognition of the image to be processed to a decoder, and the decoder decodes the result and outputs corresponding characters.
In the embodiment of the application, the processing device performs text frame regression processing on text frames with different scales in the image to be processed, so as to solve the problem of image deformation or angle movement, and then performs text recognition on the image to be processed according to the text frames with different scales after the text frame regression processing, so that the text recognition rate is improved, and a better text recognition effect is achieved. And the processing device also carries out parameter reduction processing on the image to be processed, reduces parameters and calculated amount and improves the efficiency of subsequent character recognition. In addition, the processing device takes the dense connection network as a feature extraction network, the network can take the outputs of all previous layers as the input of the current layer, so that the gradient and information propagation are more accurate, and the accuracy of subsequent character recognition based on the features of the to-be-processed image extracted by the dense connection network is higher. The processing device can also adopt an NMS algorithm to calculate the positions of a plurality of text boxes with different scales after the text box regression processing, so that the calculation result is more accurate.
Fig. 7 is a schematic structural diagram of a character recognition apparatus according to an embodiment of the present application, which corresponds to the character recognition method according to the foregoing embodiment. For convenience of explanation, only portions related to the embodiments of the present application are shown. Fig. 7 is a schematic structural diagram of a character recognition device according to an embodiment of the present application, where the character recognition device 70 includes: an image acquisition module 701, a feature extraction module 702, a text box processing module 703, and a text recognition module 704. The word recognition means may be the processing means itself, or a chip or an integrated circuit that implements the functions of the processing means. It should be noted here that the division of the image acquisition module, the feature extraction module, the text box processing module, and the text recognition module is only a division of a logic function, and the two may be integrated or independent physically.
The image obtaining module 701 is configured to obtain an image to be processed, where the image to be processed carries one or more characters.
A feature extraction module 702, configured to perform feature extraction on the image to be processed, so as to obtain an image feature corresponding to the image to be processed.
A text box processing module 703, configured to obtain, according to the image features, a plurality of text boxes with different scales in the image to be processed, and perform text box regression processing on the plurality of text boxes with different scales.
And the character recognition module 704 is configured to determine positions of the one or more characters in the image to be processed according to the plurality of character boxes with different scales after the character box regression processing, and perform character recognition on the image to be processed based on the positions of the one or more characters.
In one possible design, the feature extraction module 702 is specifically configured to:
and performing feature extraction on the image to be processed based on a dense connection network to obtain the image features corresponding to the image to be processed, wherein the dense connection network comprises one or more dense blocks, any two dense blocks in the dense connection network are directly connected, and the input of each dense block is the union of the outputs of all the dense blocks.
In one possible implementation, the dense connection network further includes one or more transitional connection layers, the transitional connection layers including 1 × 1 convolutional layers, and the input of each transitional connection layer is the union of all the previous dense blocks and the output of the transitional connection layer.
The feature extraction module 702 is specifically configured to:
and performing feature extraction on the image to be processed based on the one or more dense blocks and the one or more transitional connection layers to obtain the image features corresponding to the image to be processed.
In a possible implementation manner, the text box processing module 703 is specifically configured to:
obtaining a plurality of text frames with different scales in the image to be processed according to the image characteristics, and determining offset data of the text frames with different scales;
performing text box regression processing on the text boxes with different scales based on the offset data.
In a possible implementation manner, the text box processing module 703 is specifically configured to:
carrying out down-sampling processing on the image features, and carrying out down-sampling and convolution processing on the image features after the down-sampling processing;
and taking the image features after the downsampling and the convolution as new image features after the downsampling, re-executing the step of downsampling and the convolution on the image features after the downsampling until the character frames with different scales in the image to be processed are obtained, and determining the offset data of the character frames with different scales.
In a possible implementation manner, the text recognition module 704 is specifically configured to:
obtaining scores of the text frames with different scales after the text frame regression processing according to the text frames with different scales after the text frame regression processing and a preset score model, wherein the preset score model is used for determining the scores of the text frames with different scales according to the ratio of the intersection and union of the text frame with the highest score and the text frames with different scales in the text frames with different scales;
and calculating the positions of the text frames with different scales after the text frame regression processing according to the scores of the text frames with different scales after the text frame regression processing, and determining the positions of the one or more characters in the image to be processed based on the positions of the text frames with different scales after the text frame regression processing.
In a possible implementation manner, the text recognition module 704 is specifically configured to:
calculating the ratio of the intersection and union of a text frame with the highest score in the text frames with different scales subjected to the text frame regression processing and a text frame i subjected to the text frame regression processing, wherein the text frame i subjected to the text frame regression processing is any one of the text frames with different scales subjected to the text frame regression processing, i is 1, …, n is an integer, and n is determined according to the number of the text frames with different scales subjected to the text frame regression processing;
and if the calculated ratio is smaller than a preset threshold value, calculating the position of the text box i subjected to the text box regression processing according to the score of the text box i subjected to the text box regression processing.
In a possible implementation manner, the feature extraction module 702 is specifically configured to:
performing parameter reduction processing on the image to be processed;
and performing feature extraction on the image to be processed after parameter reduction processing to obtain image features corresponding to the image to be processed.
In a possible implementation manner, the feature extraction module 702 is specifically configured to:
and performing parameter reduction processing on the image to be processed by utilizing 3 × 3 convolutional layers and 12 × 2 pooling layer, wherein the 3 × 3 convolutional layers are connected with the 2 × 2 pooling layer in sequence.
In a possible implementation manner, the text recognition module 704 is specifically configured to:
and identifying the characters in the image to be processed based on the positions of the one or more characters and a preset identification model, wherein the preset identification model is used for identifying the characters in the image according to the positions of the characters in the image.
The apparatus provided in the embodiment of the present application may be configured to implement the technical solution of the method embodiment, and the implementation principle and the technical effect are similar, which are not described herein again in the embodiment of the present application.
Optionally, fig. 8 schematically provides a possible basic hardware architecture of the text recognition apparatus described in the present application.
Referring to fig. 8, a text recognition device 800 includes at least one processor 801 and a communication interface 803. Further optionally, a memory 802 and a bus 804 may also be included.
The character recognition device 800 may be the processing device, and the application is not limited thereto. The number of processors 801 in the word recognition device 800 may be one or more, and fig. 8 illustrates only one of the processors 801. Alternatively, the processor 801 may be a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or a Digital Signal Processing (DSP). If the word recognition device 800 has multiple processors 801, the types of the multiple processors 801 may be different, or may be the same. Alternatively, the processors 801 of the word recognition device 800 may also be integrated into a multi-core processor.
Memory 802 stores computer instructions and data; the memory 802 may store computer instructions and data necessary to implement the above-described text recognition methods provided herein, e.g., the memory 802 stores instructions for implementing the steps of the above-described text recognition methods. The memory 802 may be any one or any combination of the following storage media: nonvolatile memory (e.g., Read Only Memory (ROM), Solid State Disk (SSD), hard disk (HDD), optical disk), volatile memory.
The communication interface 803 may provide information input/output for the at least one processor. Any one or any combination of the following devices may also be included: a network interface (e.g., an ethernet interface), a wireless network card, etc. having a network access function.
Optionally, the communication interface 803 may also be used for data communication between the text recognition device 800 and other computing devices or terminals.
Further alternatively, fig. 8 shows bus 804 as a thick line. A bus 804 may connect the processor 801 with the memory 802 and the communication interface 803. Thus, via bus 804, processor 801 may access memory 802 and may also interact with other computing devices or terminals using communication interface 803.
In the present application, the word recognition apparatus 800 executes the computer instructions in the memory 802, so that the word recognition apparatus 800 implements the word recognition method provided in the present application, or the word recognition apparatus 800 deploys the word recognition device.
From the viewpoint of logical functional division, as shown in fig. 8, the memory 802 may include an image acquisition module 701, a feature extraction module 702, a text box processing module 703, and a text recognition module 704. The inclusion herein merely refers to that the instructions stored in the memory, when executed, may implement the functions of the image acquisition module, the feature extraction module, the text box processing module, and the text recognition module, respectively, without limitation to physical structures.
In addition, the character recognition device may be implemented by software as shown in fig. 8, or may be implemented by hardware as a hardware module or a circuit unit.
The present application provides a computer-readable storage medium, the computer program product comprising computer instructions that instruct a computing device to perform the above-mentioned text recognition method provided herein.
The present application provides a chip comprising at least one processor and a communication interface providing information input and/or output for the at least one processor. Further, the chip may also include at least one memory for storing computer instructions. The at least one processor is used for calling and running the computer instructions to execute the character recognition method provided by the application.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Claims (14)

1. A method for recognizing a character, comprising:
acquiring an image to be processed, wherein the image to be processed carries one or more characters;
extracting the features of the image to be processed to obtain the image features corresponding to the image to be processed;
obtaining a plurality of text frames with different scales in the image to be processed according to the image characteristics, and performing text frame regression processing on the text frames with different scales;
and determining the positions of the one or more characters in the image to be processed according to a plurality of character frames with different scales after the character frame regression processing, and performing character recognition on the image to be processed based on the positions of the one or more characters.
2. The method according to claim 1, wherein the performing feature extraction on the image to be processed to obtain image features corresponding to the image to be processed comprises:
and performing feature extraction on the image to be processed based on a dense connection network to obtain the image features corresponding to the image to be processed, wherein the dense connection network comprises one or more dense blocks, any two dense blocks in the dense connection network are directly connected, and the input of each dense block is the union of the outputs of all the dense blocks.
3. The method of claim 2, wherein the dense connection network further comprises one or more transitional connection layers, the transitional connection layers comprising 1 x 1 convolutional layers, the input of each transitional connection layer being the union of all previous dense blocks and transitional connection layer outputs;
the performing feature extraction on the image to be processed based on the dense connection network to obtain the image features corresponding to the image to be processed includes:
and performing feature extraction on the image to be processed based on the one or more dense blocks and the one or more transitional connection layers to obtain the image features corresponding to the image to be processed.
4. The method according to any one of claims 1 to 3, wherein the obtaining a plurality of text boxes with different scales in the image to be processed according to the image features and performing text box regression processing on the text boxes with different scales comprises:
obtaining a plurality of text frames with different scales in the image to be processed according to the image characteristics, and determining offset data of the text frames with different scales;
performing text box regression processing on the text boxes with different scales based on the offset data.
5. The method according to claim 4, wherein the obtaining a plurality of text boxes with different scales in the image to be processed according to the image features and determining offset data of the text boxes with different scales comprises:
carrying out down-sampling processing on the image features, and carrying out down-sampling and convolution processing on the image features after the down-sampling processing;
and taking the image features subjected to downsampling and convolution processing as new image features subjected to downsampling processing, and re-executing the steps of downsampling and convolution processing on the image features subjected to downsampling processing until the character frames with different scales in the image to be processed are obtained, and determining offset data of the character frames with different scales.
6. The method of any one of claims 1 to 3, wherein determining the position of the one or more words in the image to be processed according to a plurality of text boxes with different scales after text box regression processing comprises:
obtaining scores of the text frames with different scales after the text frame regression processing according to the text frames with different scales after the text frame regression processing and a preset score model, wherein the preset score model is used for determining the scores of the text frames with different scales according to the ratio of the intersection and union of the text frame with the highest score and the text frames with different scales in the text frames with different scales;
and calculating the positions of the text frames with different scales after the text frame regression processing according to the scores of the text frames with different scales after the text frame regression processing, and determining the positions of the one or more characters in the image to be processed based on the positions of the text frames with different scales after the text frame regression processing.
7. The method of claim 6, wherein calculating the positions of the text boxes of different scales after the text box regression processing according to the scores of the text boxes of different scales after the text box regression processing comprises:
calculating the ratio of the intersection and union of a text frame with the highest score in the text frames with different scales subjected to the text frame regression processing and a text frame i subjected to the text frame regression processing, wherein the text frame i subjected to the text frame regression processing is any one of the text frames with different scales subjected to the text frame regression processing, i is 1, …, n is an integer, and n is determined according to the number of the text frames with different scales subjected to the text frame regression processing;
and if the calculated ratio is smaller than a preset threshold value, calculating the position of the text box i subjected to the text box regression processing according to the score of the text box i subjected to the text box regression processing.
8. The method according to any one of claims 1 to 3, wherein before the performing the feature extraction on the image to be processed to obtain the image feature corresponding to the image to be processed, the method further comprises:
performing parameter reduction processing on the image to be processed;
the feature extraction of the image to be processed to obtain the image features corresponding to the image to be processed includes:
and performing feature extraction on the image to be processed after parameter reduction processing to obtain image features corresponding to the image to be processed.
9. The method according to claim 8, wherein the performing parameter reduction processing on the image to be processed comprises:
and performing parameter reduction processing on the image to be processed by utilizing 3 × 3 convolutional layers and 12 × 2 pooling layer, wherein the 3 × 3 convolutional layers are connected with the 2 × 2 pooling layer in sequence.
10. The method of any one of claims 1 to 3, wherein the performing text recognition on the image to be processed based on the position of the one or more texts comprises:
and identifying the characters in the image to be processed based on the positions of the one or more characters and a preset identification model, wherein the preset identification model is used for identifying the characters in the image according to the positions of the characters in the image.
11. A character recognition apparatus, comprising:
the image acquisition module is used for acquiring an image to be processed, and the image to be processed carries one or more characters;
the characteristic extraction module is used for extracting the characteristics of the image to be processed to obtain the image characteristics corresponding to the image to be processed;
the text frame processing module is used for obtaining a plurality of text frames with different scales in the image to be processed according to the image characteristics and performing text frame regression processing on the text frames with different scales;
and the character recognition module is used for determining the positions of the one or more characters in the image to be processed according to a plurality of character frames with different scales after the character frame regression processing, and performing character recognition on the image to be processed based on the positions of the one or more characters.
12. A character recognition apparatus, comprising:
a processor;
a memory; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor, the computer program comprising instructions for performing the method of any of claims 1-10.
13. A computer-readable storage medium, characterized in that it stores a computer program that causes a server to execute the method of any one of claims 1-10.
14. A computer program product comprising computer instructions for executing the method of any one of claims 1-10 by a processor.
CN202111535285.5A 2021-12-15 2021-12-15 Character recognition method, device, equipment and storage medium Pending CN114495132A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111535285.5A CN114495132A (en) 2021-12-15 2021-12-15 Character recognition method, device, equipment and storage medium
PCT/CN2022/102163 WO2023109086A1 (en) 2021-12-15 2022-06-29 Character recognition method, apparatus and device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111535285.5A CN114495132A (en) 2021-12-15 2021-12-15 Character recognition method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114495132A true CN114495132A (en) 2022-05-13

Family

ID=81493740

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111535285.5A Pending CN114495132A (en) 2021-12-15 2021-12-15 Character recognition method, device, equipment and storage medium

Country Status (2)

Country Link
CN (1) CN114495132A (en)
WO (1) WO2023109086A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023109086A1 (en) * 2021-12-15 2023-06-22 深圳前海微众银行股份有限公司 Character recognition method, apparatus and device, and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109583449A (en) * 2018-10-29 2019-04-05 深圳市华尊科技股份有限公司 Character identifying method and Related product
CN111476067B (en) * 2019-01-23 2023-04-07 腾讯科技(深圳)有限公司 Character recognition method and device for image, electronic equipment and readable storage medium
CN110443258B (en) * 2019-07-08 2021-03-02 北京三快在线科技有限公司 Character detection method and device, electronic equipment and storage medium
CN112364873A (en) * 2020-11-20 2021-02-12 深圳壹账通智能科技有限公司 Character recognition method and device for curved text image and computer equipment
CN114495132A (en) * 2021-12-15 2022-05-13 深圳前海微众银行股份有限公司 Character recognition method, device, equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023109086A1 (en) * 2021-12-15 2023-06-22 深圳前海微众银行股份有限公司 Character recognition method, apparatus and device, and storage medium

Also Published As

Publication number Publication date
WO2023109086A1 (en) 2023-06-22

Similar Documents

Publication Publication Date Title
CN108121986B (en) Object detection method and device, computer device and computer readable storage medium
CN113657390B (en) Training method of text detection model and text detection method, device and equipment
CN111814794B (en) Text detection method and device, electronic equipment and storage medium
CN108520247A (en) To the recognition methods of the Object node in image, device, terminal and readable medium
WO2017118356A1 (en) Text image processing method and apparatus
CN104268498A (en) Two-dimension code recognition method and terminal
CN113762309B (en) Object matching method, device and equipment
CN114429637B (en) Document classification method, device, equipment and storage medium
CN112633084A (en) Face frame determination method and device, terminal equipment and storage medium
US20230153965A1 (en) Image processing method and related device
CN111444807A (en) Target detection method, device, electronic equipment and computer readable medium
CN112668580A (en) Text recognition method, text recognition device and terminal equipment
CN114022748B (en) Target identification method, device, equipment and storage medium
CN112101344A (en) Video text tracking method and device
CN111126358A (en) Face detection method, face detection device, storage medium and equipment
CN112749576B (en) Image recognition method and device, computing equipment and computer storage medium
CN114495132A (en) Character recognition method, device, equipment and storage medium
CN111062262B (en) Invoice recognition method and invoice recognition device
CN111507250B (en) Image recognition method, device and storage medium
CN111753812A (en) Text recognition method and equipment
CN116309643A (en) Face shielding score determining method, electronic equipment and medium
CN113610856B (en) Method and device for training image segmentation model and image segmentation
CN112001479B (en) Processing method and system based on deep learning model and electronic equipment
CN114241222A (en) Image retrieval method and device
CN114120305A (en) Training method of text classification model, and recognition method and device of text content

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination