CN112766261A - Character recognition method and device and computer storage medium - Google Patents

Character recognition method and device and computer storage medium Download PDF

Info

Publication number
CN112766261A
CN112766261A CN202110074732.5A CN202110074732A CN112766261A CN 112766261 A CN112766261 A CN 112766261A CN 202110074732 A CN202110074732 A CN 202110074732A CN 112766261 A CN112766261 A CN 112766261A
Authority
CN
China
Prior art keywords
image
character
recognized
neural network
network model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110074732.5A
Other languages
Chinese (zh)
Inventor
江帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Huichuan Image Vision Technology Co ltd
Original Assignee
Nanjing Huichuan Image Vision Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Huichuan Image Vision Technology Co ltd filed Critical Nanjing Huichuan Image Vision Technology Co ltd
Priority to CN202110074732.5A priority Critical patent/CN112766261A/en
Publication of CN112766261A publication Critical patent/CN112766261A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Multimedia (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Character Discrimination (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a character recognition method, which comprises the following steps: after an image to be recognized containing characters is obtained, obtaining two-dimensional offset corresponding to each pixel point in the image to be recognized; adjusting the coordinates of sampling points of a target neural network model on the image to be identified according to the two-dimensional offset; and acquiring character information of the characters in the image to be recognized according to the adjusted sampling point coordinates and the target neural network model. The invention also discloses a character recognition device and a computer storage medium. According to the method, the two-dimensional offset of the pixel point is obtained, the sampling point position of the convolutional neural network is adjusted according to the two-dimensional offset, the fixed sampling point position is avoided, the convolved receptive field can be more concentrated around the characters in the image, the image characteristics extracted according to the adjusted sampling point position are more accurate, and the character recognition accuracy is higher.

Description

Character recognition method and device and computer storage medium
Technical Field
The present invention relates to the field of character recognition technologies, and in particular, to a character recognition method, a character recognition device, and a computer storage medium.
Background
With the development of computer image vision technology, more and more scenes adopt visual schemes to assist or replace the manual work. Character recognition technology has been applied in many industries or scenarios, such as industrial code spraying, bank cards, identification cards, and the like. Due to the different shapes of the characters in different scenes, the backgrounds of the characters are very different, such as the production date on the package, the serial number of the chip, the laser code spraying on the bottle body, and the like. The traditional feature extraction method is difficult to be qualified in increasingly complex recognition tasks, and the deep learning-based method is more and more concerned by the industry.
However, in the deep learning-based character recognition method, the sampling mode of the convolutional neural network is relatively fixed, so that the extracted feature information contains too much background information except characters, the extraction of the feature information is not accurate, and the accuracy of character recognition is low.
The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.
Disclosure of Invention
The invention mainly aims to provide a character recognition method, a character recognition device and a computer storage medium, aiming at adjusting the sampling point position of a convolutional neural network according to the two-dimensional offset of a pixel point and improving the character recognition accuracy.
In order to achieve the above object, the present invention provides a character recognition method, including the steps of:
after an image to be recognized containing characters is obtained, obtaining two-dimensional offset corresponding to each pixel point in the image to be recognized;
adjusting the coordinates of sampling points of a target neural network model on the image to be identified according to the two-dimensional offset;
and acquiring character information of the characters in the image to be recognized according to the adjusted sampling point coordinates and the target neural network model.
Optionally, the step of obtaining the character information of the character in the image to be recognized according to the adjusted coordinates of the sampling point and the target neural network model includes:
acquiring the characteristic information of the image to be identified according to the adjusted sampling point coordinates;
and inputting the characteristic information into the target neural network model to acquire character information of the characters in the image to be recognized.
Optionally, the step of obtaining the feature information of the image to be identified according to the adjusted coordinates of the sampling points includes:
acquiring attention weights of all pixel points in the image to be recognized by adopting an attention mechanism, wherein the attention weights comprise space attention weights and/or channel attention weights;
determining a target image area in the image to be identified according to the attention weight and the adjusted sampling point coordinates;
and acquiring the characteristic information of the target image area.
Optionally, the step of determining the target image region in the image to be recognized according to the attention weight and the adjusted coordinates of the sampling point includes:
determining the score of each pixel point in the image to be identified according to the adjusted sampling point coordinates;
acquiring the weighted score of the pixel point according to the score of each pixel point and the attention weight of the pixel point;
and determining target pixel points in the image to be identified according to the weighted scores, wherein the target image area comprises the target pixel points.
Optionally, after the step of obtaining the image to be recognized including the characters, before the step of obtaining the two-dimensional offset corresponding to each pixel point in the image to be recognized, the method further includes:
acquiring a plurality of first preset images;
clustering the character sizes in the first preset images to obtain character sizes of multiple categories;
taking the character sizes of the plurality of categories as the size of an anchor frame;
and training a preset neural network model according to the size of the anchor frame to obtain the target neural network model.
Optionally, the step of training the preset neural network model according to the size of the anchor frame includes:
carrying out image size normalization on the first preset image according to the size of the anchor frame;
and training the preset neural network model according to the first preset image after the image size normalization.
Optionally, the step of obtaining the character information of the character in the image to be recognized according to the adjusted coordinates of the sampling point and the target neural network model includes:
acquiring the characteristic information of the image to be identified according to the adjusted sampling point coordinates;
inputting the characteristic information into the target neural network model to obtain coded characteristic information;
and decoding the coded characteristic information by adopting the size of the anchor frame to obtain the character information of the characters in the image to be recognized.
Optionally, after the step of obtaining the image to be recognized including the characters, before the step of obtaining the two-dimensional offset corresponding to each pixel point in the image to be recognized, the method further includes:
acquiring an acquired original image;
and preprocessing the original image to obtain the image to be identified, wherein the preprocessing comprises image size normalization and/or pixel value normalization.
In addition, to achieve the above object, the present invention provides a character recognition apparatus, including: a memory, a processor and a recognition program of characters stored on the memory and operable on the processor, the recognition program of characters implementing the steps of the recognition method of characters as described in any one of the above when executed by the processor.
Furthermore, to achieve the above object, the present invention also provides a computer storage medium having stored thereon a character recognition program that, when executed by a processor, implements the steps of the character recognition method as described in any one of the above.
According to the character recognition method, the character recognition device and the computer storage medium, after an image to be recognized containing characters is obtained, two-dimensional offset corresponding to each pixel point in the image to be recognized is obtained, sampling point coordinates of a target neural network model on the image to be recognized are adjusted according to the two-dimensional offset, and character information of the characters in the image to be recognized is obtained according to the adjusted sampling point coordinates and the target neural network model. According to the method, the two-dimensional offset of the pixel point is obtained, the sampling point position of the convolutional neural network is adjusted according to the two-dimensional offset, the fixed sampling point position is avoided, the convolved receptive field can be more concentrated around the characters in the image, the image characteristics extracted according to the adjusted sampling point position are more accurate, and the character recognition accuracy is higher.
Drawings
Fig. 1 is a schematic terminal structure diagram of a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating an embodiment of a character recognition method according to the present invention;
FIG. 3 is a flow chart illustrating a character recognition method according to another embodiment of the present invention;
FIG. 4 is a flow chart illustrating a character recognition method according to another embodiment of the present invention;
FIG. 5 is a schematic flow chart of a training process of a target neural network model according to the present invention;
FIG. 6 is a flow chart of a testing process and an actual application process of a target neural network model;
FIG. 7 is a schematic diagram illustrating the effect of coordinate positions of sampling points when the present invention employs a general convolution;
FIG. 8 is a schematic diagram illustrating the effect of sampling point coordinate locations when using deformable convolution according to the present invention;
FIG. 9 is a schematic diagram of a network architecture for the attention mechanism of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The embodiment of the invention provides a solution, the position of a sampling point of a convolutional neural network is adjusted according to the two-dimensional offset by acquiring the two-dimensional offset of a pixel point, so that the position of the sampling point is prevented from being fixed, the field of experience of convolution can be more concentrated around characters in an image, the image characteristics extracted according to the adjusted sampling point position are more accurate, and the accuracy of character recognition is higher.
As shown in fig. 1, fig. 1 is a schematic terminal structure diagram of a hardware operating environment according to an embodiment of the present invention.
The terminal is the terminal equipment in the embodiment of the invention.
As shown in fig. 1, the terminal may include: a processor 1001, such as a CPU, a communication bus 1002, and a memory 1003. Wherein a communication bus 1002 is used to enable connective communication between these components. The memory 1003 may be a high-speed RAM memory or a non-volatile memory (e.g., a disk memory). The memory 1003 may alternatively be a storage device separate from the processor 1001.
Those skilled in the art will appreciate that the terminal structure shown in fig. 1 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 1, a recognition program of characters may be included in the memory 1003 as a kind of computer storage medium.
In the terminal shown in fig. 1, the processor 1001 may be configured to call a recognition program of characters stored in the memory 1003, and perform the following operations:
after an image to be recognized containing characters is obtained, obtaining two-dimensional offset corresponding to each pixel point in the image to be recognized;
adjusting the coordinates of sampling points of a target neural network model on the image to be identified according to the two-dimensional offset;
and acquiring character information of the characters in the image to be recognized according to the adjusted sampling point coordinates and the target neural network model.
Further, the processor 1001 may call a recognition program of characters stored in the memory 1003, and also perform the following operations:
acquiring the characteristic information of the image to be identified according to the adjusted sampling point coordinates;
and inputting the characteristic information into the target neural network model to acquire character information of the characters in the image to be recognized.
Further, the processor 1001 may call a recognition program of characters stored in the memory 1003, and also perform the following operations:
acquiring attention weights of all pixel points in the image to be recognized by adopting an attention mechanism, wherein the attention weights comprise space attention weights and/or channel attention weights;
determining a target image area in the image to be identified according to the attention weight and the adjusted sampling point coordinates;
and acquiring the characteristic information of the target image area.
Further, the processor 1001 may call a recognition program of characters stored in the memory 1003, and also perform the following operations:
determining the score of each pixel point in the image to be identified according to the adjusted sampling point coordinates;
acquiring the weighted score of the pixel point according to the score of each pixel point and the attention weight of the pixel point;
and determining target pixel points in the image to be identified according to the weighted scores, wherein the target image area comprises the target pixel points.
Further, the processor 1001 may call a recognition program of characters stored in the memory 1003, and also perform the following operations:
acquiring a plurality of first preset images;
clustering the character sizes in the first preset images to obtain character sizes of multiple categories;
taking the character sizes of the plurality of categories as the size of an anchor frame;
and training a preset neural network model according to the size of the anchor frame to obtain the target neural network model.
Further, the processor 1001 may call a recognition program of characters stored in the memory 1003, and also perform the following operations:
carrying out image size normalization on the first preset image according to the size of the anchor frame;
and training the preset neural network model according to the first preset image after the image size normalization.
Further, the processor 1001 may call a recognition program of characters stored in the memory 1003, and also perform the following operations:
acquiring the characteristic information of the image to be identified according to the adjusted sampling point coordinates;
inputting the characteristic information into the target neural network model to obtain coded characteristic information;
and decoding the coded characteristic information by adopting the size of the anchor frame to obtain the character information of the characters in the image to be recognized.
Further, the processor 1001 may call a recognition program of characters stored in the memory 1003, and also perform the following operations:
acquiring an acquired original image;
and preprocessing the original image to obtain the image to be identified, wherein the preprocessing comprises image size normalization and/or pixel value normalization.
Referring to fig. 2, in an embodiment, the method for recognizing characters includes the following steps:
step S10, after an image to be recognized containing characters is obtained, obtaining two-dimensional offset corresponding to each pixel point in the image to be recognized;
in this embodiment, the terminal in this embodiment is a character recognition device. The character recognition device can receive an image to be recognized containing characters and recognize character information of the characters in the image to be recognized. The image to be recognized including the characters may include an image of a production date on a product package, an image of a number of a chip, a laser-sprayed code on a bottle, etc.
Optionally, after an image to be recognized including characters is acquired, a two-dimensional offset corresponding to each pixel point in the image to be recognized may be acquired through a common convolution, specifically, the image to be recognized is input into one common convolution layer in a neural network, the convolution of the convolution layer is filled with same, that is, the size and resolution of an output image are consistent with those of the input image, an output result of the convolution layer is a two-dimensional offset corresponding to each pixel point in the input image, the two-dimensional offset is an offset vector of the pixel point, and the offset vector represents a direction and a distance of an image area where image features may exist relative to the pixel point.
Optionally, before the two-dimensional offset corresponding to each pixel point in the image to be recognized is obtained, the image to be recognized needs to be obtained first. The image to be recognized may be a preprocessed image, for example, an original image acquired by a camera is obtained first, and the original image is preprocessed to obtain the image to be recognized, where the preprocessing includes image size normalization and/or pixel value normalization, so as to improve recognition accuracy of feature information of the image through the preprocessing.
Step S20, adjusting the coordinates of the sampling points of the target neural network model on the image to be recognized according to the two-dimensional offset;
in this embodiment, the target neural network model is a trained neural network model, so that the image to be recognized can be processed directly through the target neural network model. When an image to be recognized is processed through a target neural network model, as shown in fig. 7, the target neural network model samples the image to be recognized by using a convolution kernel of the target neural network according to an input image to be recognized to extract feature information in the image to be recognized, and therefore, in this embodiment, a convolution layer in the target neural network may be set as a deformable convolution kernel to make the form of the convolution kernel more fit to characters in the image to be recognized through a change of the form of the convolution kernel, specifically, as shown in fig. 8, coordinates of a sampling point on the image to be recognized of the convolution kernel of the target neural network model are adjusted according to a two-dimensional offset, and a form of the convolution kernel is changed through a change of a position of the sampling point, as shown in fig. 7, and fig. 7 is coordinates of the sampling point before adjustment.
Optionally, when the sampling point coordinate is adjusted according to the two-dimensional offset, a pixel point corresponding to the sampling point coordinate on the image to be identified is obtained, a two-dimensional translation amount corresponding to the pixel point is obtained, and the sampling point coordinate is adjusted according to the two-dimensional offset, so that the position of the sampling point is offset according to the two-dimensional offset.
And step S30, acquiring character information of the characters in the image to be recognized according to the adjusted sampling point coordinates and the target neural network model.
In this embodiment, after the coordinates of the sampling points of the target neural network model on the image to be recognized are adjusted, the target neural network model is controlled to sample the image to be recognized according to the adjusted coordinates of the sampling points, so as to extract the feature information in the image to be recognized, and the character information in the image to be recognized is recognized through the feature information, so that the character recognition is realized.
In the technical scheme disclosed in this embodiment, the common convolution in the neural network model is replaced by the deformable convolution, the sampling point position of the convolutional neural network is adjusted according to the two-dimensional offset of the pixel point, the fixing of the sampling point position is avoided, the convolution kernel of the deformable convolution is variable, the receptive field of the convolution can be more concentrated around the characters in the image, the image features extracted according to the adjusted sampling point position are more accurate, and the accuracy of character recognition is higher.
In another embodiment, as shown in fig. 3, on the basis of the embodiment shown in fig. 2, the step S30 includes:
step S31, acquiring the characteristic information of the image to be recognized according to the adjusted coordinates of the sampling points;
in this embodiment, the deformable convolution layer of the control target neural network samples the image to be recognized according to the adjusted sampling point coordinates, and obtains the feature information of the image to be recognized through sampling.
Optionally, the feature information may include information of pixel points where characters in the image to be recognized are located.
Optionally, the target neural network includes a plurality of deformable convolution layers to sample the image to be recognized for a plurality of times to obtain more accurate feature information in the image to be recognized.
Optionally, the target neural network model includes a plurality of sampling points on the image to be recognized, and the image to be recognized is continuously sampled step by step according to the plurality of sampling points.
Optionally, when the feature information of the image to be recognized is acquired according to the adjusted coordinates of the sampling points, an attention mechanism may be further introduced, and the feature information of the image to be recognized is optimized and enhanced through the attention mechanism. Specifically, as shown in fig. 9, at least one bypass branch is branched out after the normal convolution operation, and an attention model is set on the bypass branch, where the attention model includes a Spatial attention model (Spatial attention) and/or a channel attention model (channel attention), for example, the Spatial attention model may be an STN Network (Spatial Transformer Network) and generates a channel attention weight for each position on the image, and the channel attention model may be a sense Network (sequence and Excitation Network) and generates a Spatial attention weight for each feature channel, where the channel attention extracts a relationship between channels and emphasizes a feature relationship of different Spatial positions. The method comprises the steps of inputting a graph to be recognized into an attention model, obtaining attention weights of all pixel points in an image to be recognized output by the attention model, and obtaining weights of attention at all positions in the image to be recognized, wherein the attention weights comprise space attention weights and/or channel attention weights. And determining a target image area in the image to be recognized according to the attention weight and the adjusted sampling point coordinates, and then acquiring the characteristic information of the target image area so as to increase the attention of the character area in the image to be recognized through the attention weight and ignore the non-character area in the image to be recognized, so that the extracted characteristic information is more accurate, and the recognition precision of the character is higher.
Optionally, when the target image region in the image to be recognized is determined according to the attention weight and the adjusted sampling point coordinates, the score of each pixel point in the image to be recognized can be determined according to the adjusted sampling point coordinates, and the score of each pixel point represents the basic weight of the pixel point. And obtaining the weighted scores of the pixel points according to the scores of the pixel points and the attention weights of the pixel points, and weighting the scores through the attention weights to realize the filtering of the pixel points in the image to be identified again. And determining target pixel points in the image to be identified according to the weighting scores, wherein the target image region comprises the target pixel points, for example, the pixel points with the weighting scores exceeding the preset scores can be used as the target pixel points, and the image regions where all the target pixel points are located are used as the target image regions.
Optionally, when the score of each pixel point in the image to be recognized is determined according to the adjusted sampling point coordinates, the image distance between each pixel point in the image to be recognized and the adjusted sampling point coordinates can be obtained, and the score of each pixel point is determined according to the image distance, wherein the smaller the image distance, the higher the score of the pixel point.
Step S32, inputting the feature information into the target neural network model to obtain character information of the character in the image to be recognized.
In this embodiment, the target neural network model is a trained model, and may be trained and tested in advance according to a set of a plurality of preset images, where each preset image is marked with a position of a character in each preset image, so that after the feature information is input into the target neural network model, the target neural network model may be used to recognize the character in the image to be recognized, and obtain character information of the image to be recognized output by the target neural network model.
In the technical scheme disclosed in this embodiment, the feature information of the image to be recognized is acquired according to the adjusted coordinates of the sampling points, the feature information is input into the target neural network model to acquire the character information of the characters in the image to be recognized, and character recognition is performed through the neural network model based on deformable convolution, so that the recognized characters are closer to actual characters, and the character recognition is more accurate.
In another embodiment, as shown in fig. 4, on the basis of any one of the embodiments shown in fig. 2 to 3, before the step S10, the method further includes:
step S40, acquiring a plurality of first preset images;
in this embodiment, before the target neural network model is used to identify the image to be identified, the preset neural network model needs to be trained, so as to obtain a trained target neural network model. The network structure of the preset neural network model can be set according to actual needs, for example, the convolution layer in the network structure can be set as a deformable convolution layer.
Optionally, a set in which the plurality of first preset images are located is a training set, and the preset neural network model is trained through the plurality of first preset images in the training set.
Step S50, clustering the character sizes in the first preset images to obtain character sizes of a plurality of categories;
in this embodiment, each first preset image is pre-framed with a position where a character is located, and the size of the framed image area is the character size, that is, the image area pre-framed is the actual position of the character. When the characters include the production date on the product package, the serial number of chip and the laser on the bottle spouts sign indicating number etc. the form and the size of character are similar, and the character recognition scene is comparatively single promptly, consequently, can adjust the anchor frame (anchor) size of neural network for the size of each anchor frame is close in a plurality of neural networks, makes each anchor frame all can press close to the actual size of character more, and the recognition accuracy of character is higher. Specifically, as shown in fig. 5, the character sizes in the first preset images are clustered to classify the character sizes to obtain character sizes of multiple categories, where the algorithm of the clustering process may include a K-MEANS clustering algorithm, a mean shift clustering algorithm, and the like.
Optionally, after clustering the character sizes in the first preset images, an average value or a maximum value may be taken for the character sizes in each category, or a cluster center may be determined for the character sizes in each category, and the average value or the maximum value or the character size of the cluster center is taken as the character size of the category.
Step S60, setting the character sizes of the plurality of categories as the size of an anchor frame;
and step S70, training a preset neural network model according to the size of the anchor frame to obtain the target neural network model.
In this embodiment, after the character sizes of the respective categories are obtained, the character sizes of the respective categories are used as the size of an anchor frame (anchor), and then the preset neural network model is trained according to the size of the anchor frame and a plurality of first preset images, so as to obtain a trained target neural network model.
Optionally, as shown in fig. 5, when the preset neural network model is trained according to the size of the anchor frame and a plurality of first preset images, preprocessing the first preset images according to the size of the anchor frame, where the preprocessing includes image size normalization, so as to obtain standard images in the same form through the image size normalization. And training the preset neural network model according to the preprocessed first preset image, so as to ensure the form consistency of training sample data. Optionally, the preprocessing may also include labeling and/or pixel normalization.
Optionally, as shown in fig. 6, after the target neural network model is obtained through training, a plurality of second preset images may be further obtained, the target neural network model is tested through the plurality of second preset images to detect whether the training of the target neural network model is qualified, and after the training is qualified, the target neural network model is used to perform character recognition on the image to be recognized.
Optionally, in the step of obtaining character information of characters in the image to be recognized according to the adjusted sampling point coordinates and the target neural network model, as shown in fig. 6, after obtaining the feature information of the image to be recognized according to the adjusted sampling point coordinates, the feature information may be input to the target neural network model, so as to encode the feature information through the target neural network model, and obtain the encoded depth feature information. And decoding the coded depth characteristic information by adopting the sizes of the anchor frames to obtain character information of characters in the image to be recognized.
Optionally, the character information includes a character position and a label, the character position includes a position in the image to be recognized of the character, and the label includes a character category.
In the technical scheme disclosed in the embodiment, the size of the anchor frame in the neural network is obtained by clustering the sizes of the characters, so that the size of the anchor frame of the target neural network model is more single, and according to the neural network model with the size of the anchor frame, the identification of the production date on the product package, the serial number of the chip, the laser code spraying on the bottle body and other similar characters is more accurate.
In another embodiment, based on the embodiment shown in any one of fig. 2 to 4, the improvement of the character recognition method includes two parts: the first part is the adjustment of the anchor block policy and the second part is the improvement of the network.
In the first part, as shown in fig. 5, in the training stage, the character size in the training set is counted first, then clustering is performed to serve as an anchor frame of the network, and then preprocessing and coding processing are performed on the data to train the model; as shown in fig. 6, in the test phase, the anchor frame is used to decode the encoded result, and the character position and the type are obtained.
The flow of the training phase is shown in fig. 5, and includes:
step 1, inputting a training set picture
Step 2, counting the sizes of the characters in the training set
Step 3, clustering the character size as an anchor frame
And 4, preprocessing the training set according to the clustered anchor frame, wherein the preprocessing comprises label making, image size normalization and pixel value normalization.
And 5, sending the preprocessed training set into a network for training.
The flow of the test phase is shown in fig. 6, and includes:
and 6, inputting the picture, and preprocessing the picture, including image size normalization and pixel value normalization.
Step 7, coding the image through a deep convolutional neural network
Step 8, obtaining the depth characteristics after coding
Step 9, decoding the depth features according to the anchor size clustered in the step 3
And 10 and 11, obtaining the character position and the character type after decoding.
In this embodiment, most of the conventional deep learning target detection is based on an anchor frame (anchor), and since the anchor needs to satisfy a certain size distribution in consideration of the size target. The size of the character recognition scene target is single, the sizes with great difference are not required to be compatible, and the anchor frame strategy based on target detection needs to be adjusted. By counting the character size of the training set and clustering, the training set data is encoded to generate a label as an anchor frame, so that the anchor frame is closer to the character size, and the character recognition precision is improved.
In the second part, firstly, as shown in fig. 7 and 8, the deformable convolution (deformable convolution) of fig. 8 is used to replace the ordinary convolution of fig. 7, the receptive field of the deformable convolution can be changed along with the size of the target, the receptive field can be more concentrated around the characters without excessive acquisition of background information, more accurate features can be extracted for the stuck characters and the narrow characters, and the precision and stability of the system are improved.
Second, as shown in FIG. 9, an attention mechanism (attention) is used to optimize and enhance features. And learning a weight distribution of the image characteristics through an attention module, and combining the weight with the original characteristics to obtain the weighted image characteristics. The weighting is actually a filtering of the features, and the weighted features focus on important information and ignore secondary information. Not only can the spatial level of the image features be filtered, but also the channel level can be filtered, as shown in fig. 9, the character features of a certain specific position are strengthened after the characters are subjected to the feature extraction by the convolutional neural network and an attention mechanism is used, and therefore the accuracy of character recognition is improved.
In the embodiment, the network structure is improved, the common convolution in the character recognition task is replaced by the deformable convolution, and an attention mechanism is added, so that the extracted features are more accurate, the character recognition precision can be effectively improved, and the robustness of the neural network model is improved.
In addition, an embodiment of the present invention further provides a device for recognizing a character, where the device for recognizing a character includes: the character recognition method comprises a memory, a processor and a character recognition program stored on the memory and capable of running on the processor, wherein the character recognition program realizes the steps of the character recognition method according to the various embodiments when being executed by the processor.
In addition, an embodiment of the present invention further provides a computer storage medium, where a character recognition program is stored, and the character recognition program, when executed by a processor, implements the steps of the character recognition method according to the above embodiments.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A method for recognizing a character, comprising the steps of:
after an image to be recognized containing characters is obtained, obtaining two-dimensional offset corresponding to each pixel point in the image to be recognized;
adjusting the coordinates of sampling points of a target neural network model on the image to be identified according to the two-dimensional offset;
and acquiring character information of the characters in the image to be recognized according to the adjusted sampling point coordinates and the target neural network model.
2. The character recognition method of claim 1, wherein the step of obtaining the character information of the character in the image to be recognized according to the adjusted coordinates of the sampling points and the target neural network model comprises:
acquiring the characteristic information of the image to be identified according to the adjusted sampling point coordinates;
and inputting the characteristic information into the target neural network model to acquire character information of the characters in the image to be recognized.
3. The character recognition method of claim 2, wherein the step of obtaining the feature information of the image to be recognized according to the adjusted coordinates of the sampling points comprises:
acquiring attention weights of all pixel points in the image to be recognized by adopting an attention mechanism, wherein the attention weights comprise space attention weights and/or channel attention weights;
determining a target image area in the image to be identified according to the attention weight and the adjusted sampling point coordinates;
and acquiring the characteristic information of the target image area.
4. The character recognition method of claim 3, wherein the step of determining the target image area in the image to be recognized according to the attention weight and the adjusted coordinates of the sampling points comprises:
determining the score of each pixel point in the image to be identified according to the adjusted sampling point coordinates;
acquiring the weighted score of the pixel point according to the score of each pixel point and the attention weight of the pixel point;
and determining target pixel points in the image to be identified according to the weighted scores, wherein the target image area comprises the target pixel points.
5. The character recognition method according to claim 1, wherein before the step of acquiring the two-dimensional offset corresponding to each pixel point in the image to be recognized after acquiring the image to be recognized containing the character, the method further comprises:
acquiring a plurality of first preset images;
clustering the character sizes in the first preset images to obtain character sizes of multiple categories;
taking the character sizes of the plurality of categories as the size of an anchor frame;
and training a preset neural network model according to the size of the anchor frame to obtain the target neural network model.
6. The character recognition method of claim 5, wherein the step of training a preset neural network model according to the size of the anchor frame comprises:
carrying out image size normalization on the first preset image according to the size of the anchor frame;
and training the preset neural network model according to the first preset image after the image size normalization.
7. The character recognition method of claim 5, wherein the step of obtaining the character information of the character in the image to be recognized according to the adjusted coordinates of the sampling points and the target neural network model comprises:
acquiring the characteristic information of the image to be identified according to the adjusted sampling point coordinates;
inputting the characteristic information into the target neural network model to obtain coded characteristic information;
and decoding the coded characteristic information by adopting the size of the anchor frame to obtain the character information of the characters in the image to be recognized.
8. The character recognition method according to claim 1, wherein before the step of acquiring the two-dimensional offset corresponding to each pixel point in the image to be recognized after acquiring the image to be recognized containing the character, the method further comprises:
acquiring an acquired original image;
and preprocessing the original image to obtain the image to be identified, wherein the preprocessing comprises image size normalization and/or pixel value normalization.
9. An apparatus for recognizing a character, comprising: memory, processor and a recognition program of characters stored on the memory and executable on the processor, the recognition program of characters implementing the steps of the recognition method of characters according to any one of claims 1 to 8 when executed by the processor.
10. A computer storage medium, characterized in that the computer storage medium has stored thereon a recognition program of a character, which when executed by a processor implements the steps of the recognition method of a character according to any one of claims 1 to 8.
CN202110074732.5A 2021-01-19 2021-01-19 Character recognition method and device and computer storage medium Pending CN112766261A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110074732.5A CN112766261A (en) 2021-01-19 2021-01-19 Character recognition method and device and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110074732.5A CN112766261A (en) 2021-01-19 2021-01-19 Character recognition method and device and computer storage medium

Publications (1)

Publication Number Publication Date
CN112766261A true CN112766261A (en) 2021-05-07

Family

ID=75703513

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110074732.5A Pending CN112766261A (en) 2021-01-19 2021-01-19 Character recognition method and device and computer storage medium

Country Status (1)

Country Link
CN (1) CN112766261A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113420770A (en) * 2021-06-21 2021-09-21 梅卡曼德(北京)机器人科技有限公司 Image data processing method, image data processing device, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113420770A (en) * 2021-06-21 2021-09-21 梅卡曼德(北京)机器人科技有限公司 Image data processing method, image data processing device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110046529B (en) Two-dimensional code identification method, device and equipment
US10817741B2 (en) Word segmentation system, method and device
CN110659647B (en) Seal image identification method and device, intelligent invoice identification equipment and storage medium
CN110555433A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
US20220019834A1 (en) Automatically predicting text in images
CN111178355B (en) Seal identification method, device and storage medium
CN111178290A (en) Signature verification method and device
WO2023284255A1 (en) Systems and methods for processing images
CN112381092B (en) Tracking method, tracking device and computer readable storage medium
CN110717497A (en) Image similarity matching method and device and computer readable storage medium
CN110738238A (en) certificate information classification positioning method and device
CN113221897B (en) Image correction method, image text recognition method, identity verification method and device
CN112183542A (en) Text image-based recognition method, device, equipment and medium
CN112232336A (en) Certificate identification method, device, equipment and storage medium
CN108877030B (en) Image processing method, device, terminal and computer readable storage medium
CN112766261A (en) Character recognition method and device and computer storage medium
CN113780116A (en) Invoice classification method and device, computer equipment and storage medium
CN112257708A (en) Character-level text detection method and device, computer equipment and storage medium
CN113421257B (en) Method and device for correcting rotation of text lines of dot matrix fonts
CN115953744A (en) Vehicle identification tracking method based on deep learning
CN113420767B (en) Feature extraction method, system and device for font classification
CN117523219A (en) Image processing method and device, electronic equipment and storage medium
CN115063813A (en) Training method and training device of alignment model aiming at character distortion
CN114399657A (en) Vehicle detection model training method and device, vehicle detection method and electronic equipment
CN115100663A (en) Method and device for estimating distribution situation of character height in document image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination