CN112200789A - Image identification method and device, electronic equipment and storage medium - Google Patents

Image identification method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112200789A
CN112200789A CN202011108746.6A CN202011108746A CN112200789A CN 112200789 A CN112200789 A CN 112200789A CN 202011108746 A CN202011108746 A CN 202011108746A CN 112200789 A CN112200789 A CN 112200789A
Authority
CN
China
Prior art keywords
image
character
segmented
model
recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011108746.6A
Other languages
Chinese (zh)
Other versions
CN112200789B (en
Inventor
程智博
赵正阳
栾中
吴艳华
刘军
邵赛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Academy of Railway Sciences Corp Ltd CARS
Institute of Computing Technologies of CARS
Beijing Jingwei Information Technology Co Ltd
Original Assignee
China Academy of Railway Sciences Corp Ltd CARS
Institute of Computing Technologies of CARS
Beijing Jingwei Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Academy of Railway Sciences Corp Ltd CARS, Institute of Computing Technologies of CARS, Beijing Jingwei Information Technology Co Ltd filed Critical China Academy of Railway Sciences Corp Ltd CARS
Priority to CN202011108746.6A priority Critical patent/CN112200789B/en
Publication of CN112200789A publication Critical patent/CN112200789A/en
Application granted granted Critical
Publication of CN112200789B publication Critical patent/CN112200789B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • G06T7/337Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30176Document

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Quality & Reliability (AREA)
  • Character Input (AREA)

Abstract

The embodiment of the invention provides an image identification method and device, electronic equipment and a storage medium, wherein the method comprises the following steps: preprocessing a target document image to obtain a preprocessed image to be segmented; performing semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images; identifying the segmented character image based on a character identification model to obtain a corresponding identification character; the document recognition result is obtained based on the position of each recognition character in the document image, so that the characters in the target document image can be automatically recognized, and the problems of large workload, low efficiency and high error rate of manual entry are solved.

Description

Image identification method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of neural network technologies, and in particular, to a method and an apparatus for image recognition, an electronic device, and a storage medium.
Background
In recent years, along with the gradual deepening of railway information construction, the system coverage is wider and wider, and the accumulated data volume is larger and larger. The inspection data of the railway engineering equipment overhaul record book has great significance for analyzing the equipment maintenance data.
Because the manual inspection note book is mostly with multiple form record inspection content, its contents such as chinese characters, digit scatter in the different regions of form, and some note books age has been for a long time, has phenomenons such as damaged and ageing, and some on-the-spot inspection personnel record the illegible handwriting, the writing is not enough normative, greatly increased the extraction degree of difficulty of note book information.
The manual check of the record book generally uses a table type with a complex format, and no reliable technology is available for quickly and accurately extracting the required information from the record book aiming at the complexity and difficulty of the table type.
Disclosure of Invention
Embodiments of the present invention provide an image recognition method and apparatus, an electronic device, and a storage medium, so as to solve a defect that required information cannot be extracted from a log quickly and accurately in the prior art.
The embodiment of the invention provides an image identification method, which comprises the following steps:
preprocessing a target document image to obtain a preprocessed image to be segmented;
performing semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images;
identifying the segmented character image based on a character identification model to obtain a corresponding identification character;
based on the positions of the respective recognition characters in the document image, a document recognition result is obtained.
According to the image recognition method of one embodiment of the invention, the target document image is preprocessed to obtain the preprocessed image to be segmented, and the image recognition method comprises the following steps:
repairing: extracting image texture features of the target document image, classifying defects of the target document image through a classification model, and repairing the defects of the target document image according to the classification;
and (3) registration: adopting a registration model to register the repaired image and the form template to obtain a registration image;
enhancing: and extracting the balance and contrast characteristics of the registered image, and processing the registered image by using an enhanced model to obtain the image to be segmented.
According to the image identification method of one embodiment of the invention, the balance characteristic and the contrast characteristic of the registration image are extracted, and the registration image is processed by utilizing an enhanced model, and the method comprises the following steps:
and extracting the balance characteristic and the contrast characteristic of the registered image, and processing the registered image by using an enhanced model under the condition that the balance is smaller than a first threshold or the contrast is smaller than a second threshold.
According to the method of image recognition of an embodiment of the invention, the registration model comprises a plurality of B-spline basis functions;
adopting a registration model to register the repaired image and the form template to obtain a registration image, wherein the registration image comprises:
extracting interesting characteristic points of the restored image;
and processing based on the interesting feature points, the table template and the B spline basis function to obtain the registration image.
According to the image recognition method of one embodiment of the invention, the semantic segmentation is carried out on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images, and the method comprises the following steps:
under the condition that the image to be segmented comprises a table, determining the position and arrangement relation of each cell in the table, and segmenting the cells;
performing semantic segmentation on each cell through a semantic segmentation model to obtain a plurality of character region images;
and carrying out contour recognition on the character area images to obtain a plurality of segmentation character images in each character area image.
According to the method of image recognition of an embodiment of the present invention, before the recognition of the segmented character image based on the character recognition model, the method further includes:
and training the character recognition model based on a pre-stored character data set to obtain the trained character recognition model.
According to the method for image recognition of one embodiment of the invention, before the document image is preprocessed, the method further comprises the following steps:
obtaining an initial document image through scanning;
extracting texture features of the initial document image, and performing defect detection through the texture features of the initial document image;
and carrying out contour extraction on the defective initial document image to obtain the target document image.
The embodiment of the invention also provides an image recognition device, which comprises:
the preprocessing module is used for preprocessing the target document image to obtain a preprocessed image to be segmented;
the semantic segmentation module is used for performing semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images;
the character recognition module is used for recognizing the segmented character image based on a character recognition model to obtain a corresponding recognition character;
and the document generation module is used for obtaining a document identification result based on the position of each identification character in the document image.
An embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement any of the steps of the image recognition method described above.
Embodiments of the present invention also provide a non-transitory computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the method for image recognition as described in any one of the above.
The image recognition method and the image recognition device provided by the embodiment of the invention have the advantages that the pre-processing is carried out on the target document image to obtain the pre-processed image to be segmented, the contrast between characters and the image is enhanced, and then the semantic segmentation is carried out on the image to be segmented through the semantic segmentation model to obtain a plurality of segmented character images; identifying the segmented character image based on the character identification model to obtain a corresponding identification character; the document recognition result is obtained based on the position of each recognition character in the document image, so that the characters in the target document image can be automatically recognized, and the problems of large workload, low efficiency and high error rate of manual entry are solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a method for image recognition according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of another image recognition method provided by the embodiment of the invention;
FIG. 3 is a schematic diagram of a filter provided by an embodiment of the invention;
FIG. 4 is a schematic diagram of a model structure for image recognition according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an image recognition apparatus according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terminology used in the one or more embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the invention. As used in one or more embodiments of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present invention refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that, although the terms first, second, etc. may be used herein to describe various information in one or more embodiments of the present invention, such information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first aspect may be termed a second aspect, and, similarly, a second aspect may be termed a first aspect, without departing from the scope of one or more embodiments of the present invention. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
In the embodiments of the present invention, a method and an apparatus for image recognition, an electronic device, and a non-transitory computer-readable storage medium are provided, which are described in detail in the following embodiments one by one.
First, the noun terms to which the methods of the embodiments of the invention relate are explained.
Classification and Regression Tree (CART) model: decision trees are partitions that represent classes using a structure similar to a tree, the construction of the tree can be regarded as a process of variable (attribute) selection, internal nodes represent the process of selecting those several variables (attributes) as partitions by the tree, leaf nodes of each tree represent the labels of a class, and the top level of the tree is the root node. When the dependent variable of the data set is a continuity value, the tree algorithm is a regression tree, and the mean value observed by leaf nodes can be used as a predicted value; when the dependent variable of the data set is a discrete numerical value, the tree algorithm is a classification tree, and the classification problem can be well solved. It should be noted that the algorithm is a binary tree, i.e. each non-leaf node can only extend two branches, so that when a non-leaf node is a multi-level (more than 2) discrete variable, the variable can be used multiple times.
Free-Form Deformation model (FFD): the FFD model algorithm mainly comprises two steps: embedding the object model in a frame; when the control point position changes, the frame will "pull" the model, thereby effecting deformation. Specifically, the steps include: 1) a local coordinate system STU is constructed, and then local coordinates (s, t, u) corresponding to each vertex coordinate of the model are calculated. The local coordinates (s, t, u) are fixed and invariant regardless of the change of the world coordinates of the control points; 2) and moving the control point, and recalculating the world coordinate of each vertex of the model by using the local coordinates (s, t, u) of the vertex of the model, the world coordinate of the control point and the Bernstein polynomial.
Gabor filter: the basic idea of the Gabor transform is to divide the signal into many small time intervals, each time interval being analyzed by fourier transform in order to determine the frequency at which the signal is present in that time interval. The processing method is to add a sliding window to f (t) and then perform Fourier transform. The two-dimensional Gabor filter formed by using the Gabor function has a characteristic of obtaining optimal localization in both the spatial domain and the frequency domain, and thus can well describe local structural information corresponding to spatial frequency (scale), spatial position, and directional selectivity.
The embodiment of the invention discloses an image identification method, which is shown in figure 1 and comprises the following steps of 101-104:
101. and preprocessing the target document image to obtain a preprocessed image to be segmented.
The target document image may be obtained in various ways, for example, by taking a picture or scanning a paper document, so as to generate a corresponding document image.
After the document images are obtained, not all the document images need to be identified by the method, and the subsequent image segmentation step is directly executed for the documents with clear handwriting and easy recognition; for the document with illegible handwriting, non-standard writing and blurred handwriting caused by long time of the year, the step of preprocessing is required to obtain the image to be segmented and then executing the subsequent segmentation.
102. And performing semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images.
In this embodiment, the semantic segmentation model may be a Convolutional Neural Networks (CNN) model or the like.
In this embodiment, each region is semantically segmented by the CNN model to obtain a binarized region image only including a character (number, chinese character, letter, and punctuation mark) region, interference of background and noise is eliminated, and an accurate character region position and a plurality of segmented character images are obtained by contour search.
It should be noted that what is obtained by the semantic segmentation model is an image containing characters, not the characters themselves. And further, after eliminating interference and searching outlines, obtaining the segmentation characters. For example, a character "2020" to be segmented, a region image containing the "2020" is obtained by a semantic segmentation model, and the region image may be a square image. Then, background and noise interference elimination and outline search are carried out on the area image, and character images of '2', '0', '2' and '0' are obtained respectively.
103. And identifying the segmented character image based on a character identification model to obtain a corresponding identification character.
In this embodiment, a character recognition model needs to be constructed in combination with a corresponding data set.
For example, for the recognition of the image of the railway engineering equipment overhaul record book, a railway-specific handwritten character data set needs to be formed by combining a railway common dictionary to train a character recognition model.
For another example, for image recognition of a financial data book, a financial-specific handwritten character data set needs to be formed in combination with a financial common dictionary to train a character recognition model.
104. Based on the positions of the respective recognition characters in the document image, a document recognition result is obtained.
Specifically, after each recognition character is determined, each recognition character is placed at a position in the corresponding document image, so that a document recognition result can be obtained.
Taking a table document as an example, through step 104, after characters can be recognized, the recognized characters are placed at corresponding cells based on the cells corresponding to each recognized character, and finally, a document recognition result is obtained.
The image recognition method provided by the embodiment of the invention comprises the steps of preprocessing a target document image to obtain a preprocessed image to be segmented, enhancing the contrast between characters and the image, and then performing semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images; identifying the segmented character image based on the character identification model to obtain a corresponding identification character; the document recognition result is obtained based on the position of each recognition character in the document image, so that the characters in the target document image can be automatically recognized, and the problems of large workload, low efficiency and high error rate of manual entry are solved.
The embodiment of the invention discloses an image recognition method, which is used for schematically explaining the method of the embodiment, and referring to fig. 2, and comprises the following steps:
201. an initial document image is obtained by scanning.
In this embodiment, the record book is manually checked by batch scanning, and an unstructured initial document image is generated.
202. And extracting the texture features of the initial document image, and detecting defects through the texture features of the initial document image.
The method comprises the steps of detecting defects such as missing prints, scratches and ink spots of a document image by using image texture characteristics, and extracting a contour by using a local dynamic threshold segmentation method to obtain a defect image.
203. And carrying out contour extraction on the defective initial document image to obtain the target document image.
After the target document image is obtained, because not all the target document images need to be preprocessed, preprocessing is not needed for the document image with clear handwriting and convenient recognition, and the subsequent segmentation step is directly carried out; preprocessing is needed for document images with blurred handwriting and difficult recognition.
The following steps 204-206 are three steps of the pre-treatment respectively: classification, registration and enhancement to achieve pre-processing of the target document image.
204. And extracting image texture characteristics of the target document image, classifying the defects of the target document image through a classification model, and repairing the defects of the target document image according to the classification.
In this step, the image texture features of the target document image are extracted through a Gabor filter.
Generating filter templates with different scales and different directions by the following formulas (1) to (3):
Figure BDA0002727862840000081
Figure BDA0002727862840000082
Figure BDA0002727862840000091
wherein x '═ x cos θ + y sin θ, y' ═ x sin θ + y cos θ; λ is the wavelength of a sine function, which can be understood as a scale; theta is the Gabor kernel function direction; psi refers to phase offset; σ refers to the standard deviation of the gaussian function. By changing λ and θ, a total of 15 filters in five directions at three angles are obtained, as shown in fig. 3.
In order to extract the texture features of the image, the embodiment convolves each filter in the filter bank with the target document image respectively to obtain the response value map corresponding to the filter. And calculating the filter serial number of the maximum response value obtained at each pixel point. And then establishing a normalized histogram in the local image block as an image texture feature of the image block.
In this embodiment, a Gabor filter is used to enhance a large amount of handwritten characters in an image, improve images in a highlight area and a dark area, and enhance the contrast between the characters and a paper background.
In this step, the classification model is schematically described as a CART model. The CART model is a binary tree model, different defects are classified and judged through the CART model, and repairing is carried out in the subsequent steps according to classification.
205. And registering the repaired image and the form template by adopting a registration model to obtain a registration image.
Aiming at the problems of paper deflection, dislocation and the like possibly generated in the scanning process, different scanning images are rotated, translated and scaled to the same scale and position, so that the accuracy of the subsequent identification step is enhanced.
In this step, the registration model is an FFD model as an example.
Specifically, the FFD model includes a plurality of B-spline basis functions, and step 205 includes: extracting interesting characteristic points of the restored image; and processing based on the interested feature points, the table template and the B spline basis function to obtain a registration image.
The point feature is a common feature contained in a scanned image, and an interested feature point is searched by adopting a region similarity-based strategy. Let p beiIs a feature point to be extracted, w, in the reference image RrIs p isiCentered m × n window, wfIs the window of the corresponding position u x v in the target image, when wfAnd wrWhen taking the maximum value under some similarity measure function, the window wfCenter q of (1)iIs piThe corresponding feature point of interest.
And then adopting a non-rigid registration algorithm based on an FFD model by using the extracted feature points of interest. Let Ω { (X, Y) |0 ≦ X < X, 0 ≦ Y < Y } denote the image to be registered, Φ denotes n overlaid on Ωx×nyA grid of uniform control points, wherein the (i, j) th control point is marked as phii,j,δx,δyRepresenting the grid spacing in the X-axis and Y-axis directions, respectively, the FFD model is given by the form of the tensor product of the one-dimensional 3-degree B-spline basis function, defined as the following equation (4):
Figure BDA0002727862840000101
wherein the content of the first and second substances,
Figure BDA0002727862840000102
Figure BDA0002727862840000103
the rounding operation is indicated, and the B spline basis functions are respectively as follows:
Figure BDA0002727862840000104
Figure BDA0002727862840000105
0≤u<1。
let phi0,Φ1,Φ2,...ΦKRepresenting the K +1 th layer control grid, assuming that the control fixed point is incremented once from the K to the K +1 th layer, the FFD model will be written as a combination of multi-layer sub-models in the form of equation (5) below:
Figure BDA0002727862840000106
then, obtaining the deformation function T of each layer by solving the control vertex of each layer of gridsloc(x, y), this process is repeated for all levels and the final deformation function is obtained by resampling. And finally, the mutual information is used as a similarity measurement function, and the registration of the document image can be well scanned by using a gradient optimization strategy.
206. And extracting the balance characteristic and the contrast characteristic of the registered image, and processing the registered image by using an enhanced model to obtain the image to be segmented.
Specifically, step 206 includes: and extracting the balance characteristic and the contrast characteristic of the registered image, and executing the step of processing the registered image by using the enhanced model under the condition that the balance is less than a first threshold or the contrast is less than a second threshold so as to improve the visual effect of the image.
The equalization degree characteristic is a gray scale statistical characteristic and is represented by a histogram. The gray histogram is a function of gray level distribution, and is a statistic of gray level distribution in an image. The gray histogram is to count the occurrence frequency of all pixels in the digital image according to the size of the gray value. The gray histogram is a function of gray level, which represents the number of pixels in an image having a certain gray level, reflecting the frequency of occurrence of a certain gray level in the image. The degree of uniformity of the distribution of the gray histogram can reflect important statistical information of the image, and the characteristic is defined by equation (6):
Figure BDA0002727862840000111
wherein, FhisThe degree of uniformity of the distribution of the gray level histogram;
m is the pixel bit width, which is usually 11 in medical images;
n is the number of pixels in the super pixel block;
Figure BDA0002727862840000112
for channels in a super-pixel block, c is the pixel value and v is the number of pixels.
For the contrast feature, a weber contrast feature is employed.
Weber's law (sensory threshold's law) states that, under the same stimulus, the dynamic range of the stimulus that a person can feel is proportional to the intensity of the standard stimulus, applied to the visual stimulus of a person, weber's contrast being defined as:
Figure BDA0002727862840000113
wherein I is the brightness of the object, IbIs the overall brightness of the background.
In practical applications, it is often difficult to specify the object and the background, so this embodiment writes this feature as the following formula (7):
Figure BDA0002727862840000114
wherein, FwebIs characterized by the weber contrast ratio,
Figure BDA0002727862840000115
is the mean value of the gray levels of the super-pixel blocks, Ic(p) for each pixel value in a super-pixel block; n is the number of pixels in the current window; p is the pixel position, corresponding to the combination of x and y; b refers to the current window (image block), meaning when the pixel is not known bitWhen within the current window.
207. And performing semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images.
Specifically, step 207 includes the following steps S271 to S273:
s271, under the condition that the image to be segmented comprises a table, determining the position and the arrangement relation of each cell in the table, and segmenting the cells.
And S272, performing semantic segmentation on each cell through a semantic segmentation model to obtain a plurality of character region images.
In this embodiment, the character area image may be a cell image including the character.
And S273, performing contour recognition on the character area images to obtain a plurality of segmentation character images in each character area image.
Note that the divided character image is not the same as the character area image, and the character area image includes a plurality of divided character images. Taking the character "1995" as an example, the character area image is a cell area image including the character "1995", and the divided character images are images corresponding to four characters such as "1", "9", and "5", respectively.
208. And identifying the segmented character image based on a character identification model to obtain a corresponding identification character.
In this embodiment, the character recognition model may be a CNN model.
Before the segmented character image is recognized based on the character recognition model, the character recognition model is trained based on a pre-stored character data set to obtain a trained character recognition model.
209. Based on the positions of the respective recognition characters in the document image, a document recognition result is obtained.
Taking the original target document image as a form image as an example, the finally obtained document identification result is that the corresponding cells of the form are replaced by the image after the characters are identified.
The image recognition method provided by the embodiment of the invention comprises the steps of preprocessing a target document image to obtain a preprocessed image to be segmented, enhancing the contrast between characters and the image, and then performing semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images; identifying the segmented character image based on the character identification model to obtain a corresponding identification character; the document recognition result is obtained based on the position of each recognition character in the document image, so that the characters in the target document image can be automatically recognized, and the problems of large workload, low efficiency and high error rate of manual entry are solved.
The embodiment of the invention is schematically described by taking a railway service equipment overhaul record book as an example, and refer to fig. 4. In the embodiment of fig. 4, an end-to-end model is constructed, comprising: the system comprises a CART model, an FFD model, a Gabor filter model, a semantic segmentation model and a character recognition model.
The method comprises the following steps:
1) and scanning the railway engineering equipment overhaul record book to obtain an initial document image.
2) And extracting the texture features of the initial document image, and detecting defects through the texture features of the initial document image.
If no defect is present, the subsequent step 7) is performed.
3) And carrying out contour extraction on the defective initial document image to obtain a target document image.
4) And extracting image texture characteristics of the railway engineering equipment overhaul record book image through a Gabor filter, classifying defects of the railway engineering equipment overhaul record book image through a CART classification model, and repairing the defects of the target document image according to the classification.
The railway engineering equipment overhaul record book comprises a plurality of tables. Various categories such as missing prints, scratches, patches, etc. may be classified according to defects.
The images of the railway engineering equipment overhaul record book are enhanced through the Gabor filter model, so that the images of a highlight area and a dark area can be improved, and the contrast between characters and a paper background is enhanced.
5) And registering the repaired image and the form template by adopting the FFD model to obtain a registered image.
Wherein the purpose of the registration is: aiming at the problems of paper deflection, dislocation and the like possibly generated in the scanning process, different scanning images are rotated, translated and scaled to the same scale and position, so that the accuracy of the subsequent identification step is enhanced.
6) And extracting the balance and contrast characteristics of the registered image, and processing the registered image by using an enhanced model to obtain the image to be segmented.
7) And performing semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images.
8) And identifying the segmented character image based on a character identification model to obtain a corresponding identification character.
And forming a special railway handwritten character data set based on a prestored railway common dictionary, and training the character recognition model to obtain a trained character recognition model.
9) Based on the positions of the respective recognition characters in the document image, a document recognition result is obtained, as shown in fig. 4.
The following describes an apparatus for image recognition provided by an embodiment of the present invention, and the apparatus for image recognition described below and the method for image recognition described above may be referred to correspondingly.
The embodiment of the invention discloses an image recognition device, as shown in fig. 5, comprising:
the preprocessing module 501 is configured to preprocess the target document image to obtain a preprocessed image to be segmented;
a semantic segmentation module 502, configured to perform semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images;
a character recognition module 503, configured to recognize the segmented character image based on a character recognition model to obtain a corresponding recognition character;
and a document generating module 504, configured to obtain a document identification result based on a position of each identification character in the document image.
Optionally, the preprocessing module 501 includes:
the repairing unit is used for extracting image texture characteristics of the target document image, classifying defects of the target document image through the classification model and repairing the defects of the target document image according to the classification;
the registration unit is used for registering the repaired image and the form template by adopting a registration model to obtain a registration image;
and the enhancement unit is used for extracting the balance and contrast characteristics of the registered image and processing the registered image by using an enhancement model to obtain the image to be segmented.
Optionally, the enhancing unit is specifically configured to: and extracting the balance degree characteristic and the contrast degree characteristic of the registered image, and processing the registered image by using an enhanced model under the condition that the balance degree is smaller than a first threshold value or the contrast degree is smaller than a second threshold value to obtain the image to be segmented.
Optionally, the registration model comprises a plurality of B-spline basis functions;
the registration unit is specifically configured to: extracting interesting characteristic points of the restored image; and processing based on the interesting feature points, the table template and the B spline basis function to obtain the registration image.
Optionally, the semantic segmentation module 502 is specifically configured to:
under the condition that the image to be segmented comprises a table, determining the position and arrangement relation of each cell in the table, and segmenting the cells;
performing semantic segmentation on each cell through a semantic segmentation model to obtain a plurality of character region images;
and carrying out contour recognition on the character area images to obtain a plurality of segmentation character images in each character area image.
Optionally, the apparatus further comprises: and the training module is used for training the character recognition model based on a pre-stored character data set to obtain a trained character recognition model.
Optionally, the apparatus further comprises: the defect detection module is used for obtaining an initial document image through scanning; extracting texture features of the initial document image, and performing defect detection through the texture features of the initial document image; and carrying out contour extraction on the defective initial document image to obtain the target document image.
The image recognition device provided by the embodiment of the invention obtains the preprocessed image to be segmented by preprocessing the target document image, enhances the contrast between characters and the image, and then carries out semantic segmentation on the image to be segmented by a semantic segmentation model to obtain a plurality of segmented character images; identifying the segmented character image based on the character identification model to obtain a corresponding identification character; the document recognition result is obtained based on the position of each recognition character in the document image, so that the characters in the target document image can be automatically recognized, and the problems of large workload, low efficiency and high error rate of manual entry are solved.
Fig. 6 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 6: a processor (processor)610, a communication Interface (Communications Interface)620, a memory (memory)630 and a communication bus 640, wherein the processor 610, the communication Interface 620 and the memory 630 communicate with each other via the communication bus 640. The processor 610 may invoke logic instructions in the memory 630 to perform a method of image recognition, comprising:
preprocessing a target document image to obtain a preprocessed image to be segmented;
performing semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images;
identifying the segmented character image based on a character identification model to obtain a corresponding identification character;
based on the positions of the respective recognition characters in the document image, a document recognition result is obtained.
In addition, the logic instructions in the memory 630 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, an embodiment of the present invention further provides a computer program product, where the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, and when the program instructions are executed by a computer, the computer can execute the method for image recognition provided by the above-mentioned method embodiments, including:
preprocessing a target document image to obtain a preprocessed image to be segmented;
performing semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images;
identifying the segmented character image based on a character identification model to obtain a corresponding identification character;
based on the positions of the respective recognition characters in the document image, a document recognition result is obtained.
In still another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented by a processor to perform the method for image recognition provided by the foregoing embodiments, and the method includes:
preprocessing a target document image to obtain a preprocessed image to be segmented;
performing semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images;
identifying the segmented character image based on a character identification model to obtain a corresponding identification character;
based on the positions of the respective recognition characters in the document image, a document recognition result is obtained.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method of image recognition, comprising:
preprocessing a target document image to obtain a preprocessed image to be segmented;
performing semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images;
identifying the segmented character image based on a character identification model to obtain a corresponding identification character;
based on the positions of the respective recognition characters in the document image, a document recognition result is obtained.
2. The image recognition method of claim 1, wherein preprocessing the target document image to obtain a preprocessed image to be segmented comprises:
repairing: extracting image texture features of the target document image, classifying defects of the target document image through a classification model, and repairing the defects of the target document image according to the classification;
and (3) registration: adopting a registration model to register the repaired image and the form template to obtain a registration image;
enhancing: and extracting the balance characteristic and the contrast characteristic of the registered image, and processing the registered image by using an enhanced model to obtain the image to be segmented.
3. The method of image recognition according to claim 2, wherein extracting the balance feature and the contrast feature of the registered image and processing the registered image with the enhanced model comprises:
and extracting the balance characteristic and the contrast characteristic of the registered image, and processing the registered image by using an enhanced model under the condition that the balance is smaller than a first threshold or the contrast is smaller than a second threshold.
4. The method of image recognition according to claim 2, wherein the registration model comprises a plurality of B-spline basis functions;
adopting a registration model to register the repaired image and the form template to obtain a registration image, wherein the registration image comprises:
extracting interesting characteristic points of the restored image;
and processing based on the interesting feature points, the table template and the B spline basis function to obtain the registration image.
5. The image recognition method of claim 1, wherein performing semantic segmentation on the image to be segmented by a semantic segmentation model to obtain a plurality of segmented character images comprises:
under the condition that the image to be segmented comprises a table, determining the position and arrangement relation of each cell in the table, and segmenting the cells;
performing semantic segmentation on each cell through a semantic segmentation model to obtain a plurality of character region images;
and carrying out contour recognition on the character area images to obtain a plurality of segmentation character images in each character area image.
6. The method of image recognition according to claim 1, wherein prior to recognizing the segmented character image based on a character recognition model, the method further comprises:
and training the character recognition model based on a pre-stored character data set to obtain the trained character recognition model.
7. The method of image recognition according to claim 1, wherein prior to pre-processing the document image, the method further comprises:
obtaining an initial document image through scanning;
extracting texture features of the initial document image, and performing defect detection through the texture features of the initial document image;
and carrying out contour extraction on the defective initial document image to obtain the target document image.
8. An apparatus for image recognition, comprising:
the preprocessing module is used for preprocessing the target document image to obtain a preprocessed image to be segmented;
the semantic segmentation module is used for performing semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images;
the character recognition module is used for recognizing the segmented character image based on a character recognition model to obtain a corresponding recognition character;
and the document generation module is used for obtaining a document identification result based on the position of each identification character in the document image.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of image recognition according to any of claims 1 to 7 are implemented when the program is executed by the processor.
10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of image recognition according to any one of claims 1 to 7.
CN202011108746.6A 2020-10-16 2020-10-16 Image recognition method and device, electronic equipment and storage medium Active CN112200789B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011108746.6A CN112200789B (en) 2020-10-16 2020-10-16 Image recognition method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011108746.6A CN112200789B (en) 2020-10-16 2020-10-16 Image recognition method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112200789A true CN112200789A (en) 2021-01-08
CN112200789B CN112200789B (en) 2023-11-21

Family

ID=74010472

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011108746.6A Active CN112200789B (en) 2020-10-16 2020-10-16 Image recognition method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112200789B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112949514A (en) * 2021-03-09 2021-06-11 广州文石信息科技有限公司 Scanned document information processing method and device, electronic equipment and storage medium
CN113407783A (en) * 2021-05-13 2021-09-17 中车太原机车车辆有限公司 Electric locomotive overhaul record management system

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104615974A (en) * 2015-01-15 2015-05-13 成都交大光芒科技股份有限公司 Continuous supporting pole number plate image recognizing method based on tracking algorithm
CN106844767A (en) * 2017-02-23 2017-06-13 中国科学院自动化研究所 Format file block information key registration and the method and device extracted
CN107220965A (en) * 2017-05-05 2017-09-29 上海联影医疗科技有限公司 A kind of image partition method and system
WO2018040118A1 (en) * 2016-08-29 2018-03-08 武汉精测电子集团股份有限公司 Gpu-based tft-lcd mura defect detection method
CN108345881A (en) * 2018-02-01 2018-07-31 福州大学 A kind of document quality detection method based on computer vision
CN108399610A (en) * 2018-03-20 2018-08-14 上海应用技术大学 A kind of depth image enhancement method of fusion RGB image information
US20180322635A1 (en) * 2017-05-05 2018-11-08 Shanghai United Imaging Healthcare Co., Ltd. Systems and methods for image processing
CN109460769A (en) * 2018-11-16 2019-03-12 湖南大学 A kind of mobile end system and method based on table character machining and identification
CN109583449A (en) * 2018-10-29 2019-04-05 深圳市华尊科技股份有限公司 Character identifying method and Related product
CN109829906A (en) * 2019-01-31 2019-05-31 桂林电子科技大学 It is a kind of based on the workpiece, defect of the field of direction and textural characteristics detection and classification method
CN110390650A (en) * 2019-07-23 2019-10-29 中南大学 OCT image denoising method based on intensive connection and generation confrontation network
CN110796031A (en) * 2019-10-11 2020-02-14 腾讯科技(深圳)有限公司 Table identification method and device based on artificial intelligence and electronic equipment
US20200184278A1 (en) * 2014-03-18 2020-06-11 Z Advanced Computing, Inc. System and Method for Extremely Efficient Image and Pattern Recognition and Artificial Intelligence Platform
CN111507214A (en) * 2020-04-07 2020-08-07 中国人民财产保险股份有限公司 Document identification method, device and equipment
CN111625621A (en) * 2020-04-27 2020-09-04 中国铁道科学研究院集团有限公司电子计算技术研究所 Document retrieval method and device, electronic equipment and storage medium
CN111723807A (en) * 2019-03-20 2020-09-29 Sap欧洲公司 Recognizing machine-typed and handwritten characters using end-to-end deep learning

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200184278A1 (en) * 2014-03-18 2020-06-11 Z Advanced Computing, Inc. System and Method for Extremely Efficient Image and Pattern Recognition and Artificial Intelligence Platform
CN104615974A (en) * 2015-01-15 2015-05-13 成都交大光芒科技股份有限公司 Continuous supporting pole number plate image recognizing method based on tracking algorithm
WO2018040118A1 (en) * 2016-08-29 2018-03-08 武汉精测电子集团股份有限公司 Gpu-based tft-lcd mura defect detection method
CN106844767A (en) * 2017-02-23 2017-06-13 中国科学院自动化研究所 Format file block information key registration and the method and device extracted
US20180322635A1 (en) * 2017-05-05 2018-11-08 Shanghai United Imaging Healthcare Co., Ltd. Systems and methods for image processing
CN107220965A (en) * 2017-05-05 2017-09-29 上海联影医疗科技有限公司 A kind of image partition method and system
CN108345881A (en) * 2018-02-01 2018-07-31 福州大学 A kind of document quality detection method based on computer vision
CN108399610A (en) * 2018-03-20 2018-08-14 上海应用技术大学 A kind of depth image enhancement method of fusion RGB image information
CN109583449A (en) * 2018-10-29 2019-04-05 深圳市华尊科技股份有限公司 Character identifying method and Related product
CN109460769A (en) * 2018-11-16 2019-03-12 湖南大学 A kind of mobile end system and method based on table character machining and identification
CN109829906A (en) * 2019-01-31 2019-05-31 桂林电子科技大学 It is a kind of based on the workpiece, defect of the field of direction and textural characteristics detection and classification method
CN111723807A (en) * 2019-03-20 2020-09-29 Sap欧洲公司 Recognizing machine-typed and handwritten characters using end-to-end deep learning
CN110390650A (en) * 2019-07-23 2019-10-29 中南大学 OCT image denoising method based on intensive connection and generation confrontation network
CN110796031A (en) * 2019-10-11 2020-02-14 腾讯科技(深圳)有限公司 Table identification method and device based on artificial intelligence and electronic equipment
CN111507214A (en) * 2020-04-07 2020-08-07 中国人民财产保险股份有限公司 Document identification method, device and equipment
CN111625621A (en) * 2020-04-27 2020-09-04 中国铁道科学研究院集团有限公司电子计算技术研究所 Document retrieval method and device, electronic equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
SHAOHUA WAN等: "Efficient computation offloading for Internet of Vehicles in edge computing-assisted 5G networks", THE JOURNAL OF SUPERCOMPUTING, pages 2518 - 2547 *
范苏苏: "基于纹理图像的计算机视觉识别算法研究", 中国优秀硕士学位论文数据库信息科技辑, pages 138 - 336 *
马惠珠;宋朝晖;季飞;侯嘉;熊小芸;: "项目计算机辅助受理的研究方向与关键词――2012年度受理情况与2013年度注意事项", 电子与信息学报, no. 01 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112949514A (en) * 2021-03-09 2021-06-11 广州文石信息科技有限公司 Scanned document information processing method and device, electronic equipment and storage medium
CN113407783A (en) * 2021-05-13 2021-09-17 中车太原机车车辆有限公司 Electric locomotive overhaul record management system

Also Published As

Publication number Publication date
CN112200789B (en) 2023-11-21

Similar Documents

Publication Publication Date Title
US8442319B2 (en) System and method for classifying connected groups of foreground pixels in scanned document images according to the type of marking
CN111626146B (en) Merging cell table segmentation recognition method based on template matching
CN110298376B (en) Bank bill image classification method based on improved B-CNN
CN109740606B (en) Image identification method and device
CN110503054B (en) Text image processing method and device
CN106610969A (en) Multimodal information-based video content auditing system and method
Türkyılmaz et al. License plate recognition system using artificial neural networks
CN111353491B (en) Text direction determining method, device, equipment and storage medium
CN104123550A (en) Cloud computing-based text scanning identification method
Obaidullah et al. A system for handwritten script identification from Indian document
Roy et al. Script identification from handwritten document
CN112949455B (en) Value-added tax invoice recognition system and method
CN112069900A (en) Bill character recognition method and system based on convolutional neural network
Karunarathne et al. Recognizing ancient sinhala inscription characters using neural network technologies
CN113158895A (en) Bill identification method and device, electronic equipment and storage medium
CN112200789B (en) Image recognition method and device, electronic equipment and storage medium
CN114463767A (en) Credit card identification method, device, computer equipment and storage medium
EP4244761A1 (en) Fraud detection via automated handwriting clustering
CN111738979A (en) Automatic certificate image quality inspection method and system
CN108921006B (en) Method for establishing handwritten signature image authenticity identification model and authenticity identification method
CN106709490B (en) Character recognition method and device
CN114581928A (en) Form identification method and system
CN112508000B (en) Method and equipment for generating OCR image recognition model training data
CN111241897A (en) Industrial checklist digitization by inferring visual relationships
US20230069960A1 (en) Generalized anomaly detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant