CN112200789B - Image recognition method and device, electronic equipment and storage medium - Google Patents

Image recognition method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112200789B
CN112200789B CN202011108746.6A CN202011108746A CN112200789B CN 112200789 B CN112200789 B CN 112200789B CN 202011108746 A CN202011108746 A CN 202011108746A CN 112200789 B CN112200789 B CN 112200789B
Authority
CN
China
Prior art keywords
image
character
model
segmented
recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011108746.6A
Other languages
Chinese (zh)
Other versions
CN112200789A (en
Inventor
程智博
赵正阳
栾中
吴艳华
刘军
邵赛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Academy of Railway Sciences Corp Ltd CARS
Institute of Computing Technologies of CARS
Beijing Jingwei Information Technology Co Ltd
Original Assignee
China Academy of Railway Sciences Corp Ltd CARS
Institute of Computing Technologies of CARS
Beijing Jingwei Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Academy of Railway Sciences Corp Ltd CARS, Institute of Computing Technologies of CARS, Beijing Jingwei Information Technology Co Ltd filed Critical China Academy of Railway Sciences Corp Ltd CARS
Priority to CN202011108746.6A priority Critical patent/CN112200789B/en
Publication of CN112200789A publication Critical patent/CN112200789A/en
Application granted granted Critical
Publication of CN112200789B publication Critical patent/CN112200789B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • G06T7/337Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30176Document

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Quality & Reliability (AREA)
  • Character Input (AREA)

Abstract

The embodiment of the invention provides a method and a device for image recognition, electronic equipment and a storage medium, wherein the method comprises the following steps: preprocessing a target document image to obtain a preprocessed image to be segmented; carrying out semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images; identifying the segmented character image based on a character identification model to obtain a corresponding identification character; based on the positions of the identification characters in the document image, a document identification result is obtained, so that the characters in the target document image can be automatically identified, and the problems of high manual input workload, low efficiency and high error rate are solved.

Description

Image recognition method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of neural networks, and in particular, to a method and apparatus for image recognition, an electronic device, and a storage medium.
Background
In recent years, with the gradual penetration of railway informatization construction, the coverage of a system is wider and wider, and the accumulated data volume is larger and larger. The inspection data of the railway working equipment maintenance record book has great significance for equipment maintenance data analysis.
Because the manual inspection record book records inspection contents in a plurality of table forms, chinese characters, numbers and other contents are scattered in different areas of the table, the phenomena of damage, aging and the like exist for a part of the record book for a long time, and the record writing of some field inspection personnel is illegal and writing is not standard, so that the extraction difficulty of the record book information is greatly increased.
Manual checking of a log book generally uses a form of complex format, and for the complexity and difficulty of such forms, there is no reliable technique capable of quickly and accurately extracting the required information from the log book.
Disclosure of Invention
The embodiment of the invention provides an image recognition method and device, electronic equipment and a storage medium, which are used for solving the defect that required information cannot be extracted from a record book rapidly and accurately in the prior art.
The embodiment of the invention provides an image identification method, which comprises the following steps:
preprocessing a target document image to obtain a preprocessed image to be segmented;
carrying out semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images;
identifying the segmented character image based on a character identification model to obtain a corresponding identification character;
based on the positions of the respective recognition characters in the document image, a document recognition result is obtained.
According to the image recognition method of one embodiment of the invention, preprocessing is carried out on the target document image to obtain a preprocessed image to be segmented, and the method comprises the following steps:
repairing: extracting image texture characteristics of a target document image, carrying out defect classification on the target document image through a classification model, and carrying out defect repair on the target document image according to classification;
registering: registering the repair image and the form template by adopting a registration model to obtain a registration image;
enhancement: and extracting the balance degree and contrast characteristic of the registration image, and processing the registration image by utilizing the enhancement model to obtain the image to be segmented.
According to a method of image recognition of an embodiment of the present invention, extracting a balance feature and a contrast feature of a registered image, and processing the registered image using an enhancement model, includes:
and extracting the balance degree characteristic and the contrast characteristic of the registration image, and processing the registration image by using the enhancement model under the condition that the balance degree is smaller than a first threshold value or the contrast is smaller than a second threshold value.
According to a method of image recognition of an embodiment of the present invention, the registration model includes a plurality of B-spline basis functions;
registering the repair image and the form template by adopting a registration model to obtain a registration image, wherein the registering comprises the following steps:
extracting interesting feature points of the repair image;
and processing based on the interesting feature points, the table template and the B-spline basis function to obtain the registration image.
According to the image recognition method of one embodiment of the present invention, semantic segmentation is performed on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images, including:
determining the position and arrangement relation of each cell in a table and cutting the cells when the image to be segmented comprises the table;
carrying out semantic segmentation on each cell through a semantic segmentation model to obtain a plurality of character region images;
and carrying out contour recognition on the character area images to obtain a plurality of segmentation character images in each character area image.
According to a method of image recognition of an embodiment of the present invention, before the recognition of the segmented character image based on the character recognition model, the method further includes:
training the character recognition model based on a pre-stored text data set to obtain a trained character recognition model.
According to a method of image recognition according to an embodiment of the present invention, before preprocessing a document image, the method further includes:
obtaining an initial document image by scanning;
extracting texture features of the initial document image, and performing defect detection through the texture features of the initial document image;
and extracting the outline of the initial document image with the defects to obtain the target document image.
The embodiment of the invention also provides an image recognition device, which comprises:
the preprocessing module is used for preprocessing the target document image to obtain a preprocessed image to be segmented;
the semantic segmentation module is used for carrying out semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images;
the character recognition module is used for recognizing the segmented character image based on a character recognition model to obtain a corresponding recognition character;
and the document generation module is used for obtaining a document recognition result based on the positions of the recognition characters in the document image.
The embodiment of the invention also provides electronic equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps of the image identification method according to any one of the above when executing the program.
Embodiments of the present invention also provide a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of a method of image recognition as described in any of the above.
According to the image recognition method and device provided by the embodiment of the invention, the target document image is preprocessed to obtain the preprocessed image to be segmented, the contrast between characters and the image is enhanced, and then the image to be segmented is subjected to semantic segmentation through a semantic segmentation model to obtain a plurality of segmented character images; identifying the segmented character image based on the character identification model to obtain a corresponding identification character; based on the positions of the identification characters in the document image, a document identification result is obtained, so that the characters in the target document image can be automatically identified, and the problems of high manual input workload, low efficiency and high error rate are solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for image recognition according to an embodiment of the present invention;
FIG. 2 is a flow chart of another method for image recognition according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a filter provided by an embodiment of the present invention;
FIG. 4 is a schematic diagram of a model structure for image recognition according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an apparatus for image recognition according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The terminology used in the one or more embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the invention. As used in one or more embodiments of the invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present invention refers to and encompasses any or all possible combinations of one or more of the associated listed items.
It should be understood that, although the terms first, second, etc. may be used in one or more embodiments of the invention to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of one or more embodiments of the invention. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.
In the embodiments of the present invention, a method and apparatus for image recognition, an electronic device, and a non-transitory computer readable storage medium are provided, and detailed descriptions are given in the following embodiments.
First, terms related to the method of the embodiment of the invention will be explained.
Classification regression tree (Classification and Regression Tree, CART) model: decision trees are processes that use a structure similar to a tree to represent class partitioning, the construction of the tree can be seen as a variable (attribute) selection process, the internal nodes represent the tree to select those variables (attributes) as partitions, the leaf nodes of each tree represent the labels of a class, and the top-most layer of the tree is the root node. When the dependent variable of the data set is a continuity value, the tree algorithm is a regression tree, and the average value observed by the leaf nodes can be used as a predicted value; when the dependent variable of the data set is a discrete numerical value, the tree algorithm is a classification tree, so that the classification problem can be well solved. It should be noted that the algorithm is a binary tree, i.e. each non-leaf node can only extend two branches, so that when a certain non-leaf node is a multi-level (more than 2) discrete variable, it is possible that the variable is used multiple times.
Free-Form Deformation model (FFD): the algorithm of the FFD model mainly includes two steps: embedding the object model in a frame; when the control point position changes, the frame will "pull" the model, thus effecting deformation. Specifically, the method comprises the following steps: 1) A local coordinate system STU is constructed and then the local coordinates (s, t, u) corresponding to each vertex coordinate of the model are calculated. The local coordinates (s, t, u) are fixed regardless of the change in world coordinates of the control point; 2) Moving the control points, and recalculating the world coordinates of each vertex of the model by using the model vertex local coordinates (s, t, u), the control point world coordinates and Bernstein polynomials.
Gabor filter: the basic idea of Gabor transformation is to divide the signal into a number of small time intervals, each of which is analysed by fourier transformation in order to determine the frequency at which the signal is present. The processing method is to add a sliding window to f (t) and then carry out Fourier transform. The two-dimensional Gabor filter formed by Gabor functions has a characteristic of simultaneously achieving optimal localization in a spatial domain and a frequency domain, and thus local structure information corresponding to spatial frequency (scale), spatial position, and directional selectivity can be well described.
The embodiment of the invention discloses a method for identifying images, which is shown in fig. 1 and comprises the following steps of 101-104:
101. and preprocessing the target document image to obtain a preprocessed image to be segmented.
The target document image may be obtained in various manners, for example, photographing or scanning a paper document to generate a corresponding document image.
After the document image is obtained, not all the document images need to be identified by the method, and the subsequent image segmentation step is directly executed on the document with clear handwriting and easy identification; for a document with a bad handwriting, irregular writing and blurred handwriting due to the past age, the image to be segmented is obtained after preprocessing, and then the subsequent segmentation step is executed.
102. And carrying out semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images.
In this embodiment, the semantic segmentation model may be a convolutional neural network (Convolutional Neural Networks, CNN) model or the like.
In this embodiment, semantic segmentation is performed on each region through a CNN model to obtain a binarized region image only containing regions of characters (numerals, chinese characters, letters and punctuations), interference of background and noise is eliminated, and accurate character region positions are obtained through contour search and a plurality of segmented character images are obtained.
It should be noted that the image containing the character is obtained by the semantic segmentation model, not the character itself. The segmented character is further obtained by eliminating the interference and contour search. For example, a character "2020" to be segmented, and obtaining a region image containing the "2020" through a semantic segmentation model, wherein the region image can be a square image. And then, carrying out background and noise interference elimination and contour search on the area image to obtain character images of '2', '0', respectively.
103. And identifying the segmented character image based on a character identification model to obtain a corresponding identification character.
In this embodiment, a character recognition model needs to be built in combination with the corresponding data set.
For example, for the recognition of the railway service equipment maintenance record book image, a railway special handwritten text data set is required to be formed by combining a railway common dictionary so as to train a character recognition model.
Also for example for image recognition of financial data books, it is necessary to form a financial dedicated handwritten text data set in combination with a common dictionary of finances to train a character recognition model.
104. Based on the positions of the respective recognition characters in the document image, a document recognition result is obtained.
Specifically, after each recognition character is determined, each recognition character is placed at a position in the corresponding document image, so that a document recognition result can be obtained.
Taking a table document as an example, after the characters can be identified in step 104, the identified characters are placed at the corresponding cells based on the cells corresponding to each identified character, and finally, the document identification result is obtained.
According to the image recognition method provided by the embodiment of the invention, the target document image is preprocessed to obtain the preprocessed image to be segmented, the contrast between characters and the image is enhanced, and then the image to be segmented is subjected to semantic segmentation through a semantic segmentation model to obtain a plurality of segmented character images; identifying the segmented character image based on the character identification model to obtain a corresponding identification character; based on the positions of the identification characters in the document image, a document identification result is obtained, so that the characters in the target document image can be automatically identified, and the problems of high manual input workload, low efficiency and high error rate are solved.
The embodiment of the invention discloses a method for image recognition, which is used for schematically describing the method of the embodiment, and referring to fig. 2, the method comprises the following steps:
201. an initial document image is obtained by scanning.
In this embodiment, the original document image is generated unstructured by batch scanning to manually examine the album.
202. And extracting texture features of the initial document image, and detecting defects through the texture features of the initial document image.
And detecting defects such as missing printing, scratching, ink spots and the like of the document image by using image texture features, and extracting the outline by using a local dynamic threshold segmentation method to obtain a defect image.
203. And extracting the outline of the initial document image with the defects to obtain the target document image.
After the target document image is obtained, as not all the target document images need to be preprocessed, preprocessing is not needed for document images with clear handwriting and convenient recognition, and the subsequent segmentation step is directly carried out; the document image with blurred writing and difficult recognition needs to be preprocessed.
The following steps 204-206 are three steps of preprocessing: classification, registration and enhancement to enable preprocessing of the target document image.
204. Extracting image texture characteristics of the target document image, carrying out defect classification on the target document image through a classification model, and carrying out defect repair on the target document image according to classification.
In this step, image texture features of the target document image are extracted by a Gabor filter.
Filter templates of different dimensions and directions are generated by the following formulas (1) to (3):
where x '=xcoosθ+ysinθ, y' = -xcoosθ+ycoosθ; λ is a sine function wavelength, which can be understood as the scale; θ is the Gabor kernel direction; psi refers to the phase offset; sigma refers to the standard deviation of the gaussian function. By varying λ and θ, a total of 15 filters for three angles and five directions were obtained, as shown in fig. 3.
In order to extract the texture features of the image, in this embodiment, each filter in the filter bank is convolved with the target document image, respectively, to obtain a response value map corresponding to the filter. And calculating a filter serial number of the maximum response value obtained at each pixel point. A normalized histogram is then built up in the image local block as an image texture feature for that image block.
In this embodiment, the Gabor filter is used to enhance a large number of handwritten characters in the image, improve the images of the highlight region and the dark region, and enhance the contrast between the characters and the paper background.
In this step, a classification model is schematically described as an example of a CART model. The CART model is a binary tree model, and different defects are classified and judged through the CART model, and the defects are repaired according to classification in the subsequent step.
205. And registering the repair image and the form template by adopting a registration model to obtain a registration image.
Aiming at the problems of paper deflection, dislocation and the like possibly generated in the scanning process, different scanned images are rotated, translated and scaled to the same scale and position, so that the accuracy of the subsequent recognition step is enhanced.
In this step, an FFD model is taken as an example of the registration model.
Specifically, the FFD model includes a plurality of B-spline basis functions, and step 205 includes: extracting interesting feature points of the repair image; and processing based on the interesting feature points, the form template and the B-spline basis function to obtain a registration image.
The point feature is a common feature contained in the scanned image, and the feature point of interest is searched by adopting a region-based similarity strategy. Let p be i Is a feature point to be extracted in the reference image R, w r Is p is i M x n window as center, w f Is a window of corresponding position u x v in the target image, when w f And w is equal to r When taking the maximum under some similarity measure function, window w f Center q of (2) i Namely p i Corresponding feature points of interest.
A non-rigid registration algorithm based on the FFD model is then employed with the extracted feature points of interest. Let Ω= { (X, Y) |0 be less than or equal to X < X,0 be less than or equal to Y < Y } denote the image to be registered, Φ denote n overlaid on Ω x ×n y A uniform control point grid, wherein the (i, j) th control point is marked as phi i,j ,δ x ,δ y Representing the grid spacing in the X-axis and Y-axis directions, respectively, the FFD model is given in the form of a tensor product of a one-dimensional 3-degree B-spline basis function, defined as the following formula (4):
wherein, the finger rounding operation, the B spline basis functions are respectively as follows: /> 0≤u<1。
Let phi 0 ,Φ 1 ,Φ 2 ,...Φ K Representing a K+1th layer control grid, assuming one increment of the control setpoint from K to K+1th layer, the FFD model will be written in the form of a combination of multiple layers of submodels as shown in equation (5) below:
next, obtaining the deformation function T of each layer by solving control vertexes of each layer of grids loc (x, y) repeating this process for all layers and obtaining the final deformation function by resampling. And finally, the mutual information is used as a similarity measurement function, and the registration of the document images can be scanned better by using a gradient optimization strategy.
206. And extracting the balance degree characteristic and the contrast characteristic of the registration image, and processing the registration image by utilizing the enhancement model to obtain the image to be segmented.
Specifically, step 206 includes: and extracting the balance degree characteristic and the contrast degree characteristic of the registered image, and executing the step of processing the registered image by using the enhancement model under the condition that the balance degree is smaller than a first threshold value or the contrast degree is smaller than a second threshold value so as to improve the visual effect of the image.
The balance degree characteristic is a gray level statistical characteristic and is characterized by adopting a histogram. The gray level histogram is a function of the gray level distribution and is a statistic of the gray level distribution in the image. The gray level histogram is to count the occurrence frequency of all pixels in the digital image according to the gray level value. A gray histogram is a function of gray levels and represents the number of pixels in an image that have a certain gray level, reflecting the frequency at which a certain gray level appears in the image. The uniformity of the gray histogram distribution can reflect important statistical information of the image, and the feature is defined by the formula (6):
wherein F is his The uniformity of the gray level histogram distribution;
m is the pixel bit width, typically 11 in medical images;
n is the number of pixels in the super pixel block;
is the channel in the super-pixel block, c is the pixel value, and v is the number of pixels.
For contrast features, weber contrast features are employed.
Weber's law (sensory threshold law) states that under the same stimulus, the dynamic range of the stimulus perceived by a person is proportional to the intensity of the standard stimulus, and the weber contrast is defined as applied to the visual stimulus of a person:wherein I is the brightness of the object, I b Is the overall brightness of the background.
In practical applications, it is often difficult to point out objects and backgrounds, so this embodiment writes this feature as the following formula (7):
wherein F is web In order to be a weber contrast feature,is the gray average value of super pixel block, I c (p) is each pixel value in the super pixel block; n is the number of pixels in the current window; p is the pixel position, corresponding to the combination of x and y; b refers to the current window (image block), meaning when the pixel is not known to be within the current window.
207. And carrying out semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images.
Specifically, step 207 includes the following steps S271 to S273:
s271, in the case that the image to be segmented comprises a table, determining the position and the arrangement relation of each cell in the table, and segmenting the cells.
And S272, performing semantic segmentation on each cell through a semantic segmentation model to obtain a plurality of character region images.
In this embodiment, the character area image may be a cell image including the character.
S273, carrying out contour recognition on the character area images to obtain a plurality of segmentation character images in each character area image.
Note that the divided character image is not identical to the character area image, and the character area image includes a plurality of divided character images. Taking the character "1995" as an example, the character area image is a cell area image including the character "1995", and the divided character images are images corresponding to four characters of "1", "9", "5", and the like, respectively.
208. And identifying the segmented character image based on a character identification model to obtain a corresponding identification character.
In this embodiment, the character recognition model may be a CNN model.
Before the segmented character image is identified based on the character recognition model, training the character recognition model based on a pre-stored character data set is needed to obtain a trained character recognition model.
209. Based on the positions of the respective recognition characters in the document image, a document recognition result is obtained.
Taking the original target document image as a table image as an example, the corresponding cells of which the finally obtained document identification result is a table are replaced by images after character identification.
According to the image recognition method provided by the embodiment of the invention, the target document image is preprocessed to obtain the preprocessed image to be segmented, the contrast between characters and the image is enhanced, and then the image to be segmented is subjected to semantic segmentation through a semantic segmentation model to obtain a plurality of segmented character images; identifying the segmented character image based on the character identification model to obtain a corresponding identification character; based on the positions of the identification characters in the document image, a document identification result is obtained, so that the characters in the target document image can be automatically identified, and the problems of high manual input workload, low efficiency and high error rate are solved.
Taking a railway service equipment maintenance record book as an example, the embodiment of the invention is schematically described, and the embodiment is shown in fig. 4. An end-to-end model was constructed in the embodiment of fig. 4, comprising: CART model, FFD model, gabor filter model, semantic segmentation model, and word recognition model.
The method comprises the following steps:
1) And scanning the maintenance record book of the railway working equipment to obtain an initial document image.
2) And extracting texture features of the initial document image, and detecting defects through the texture features of the initial document image.
If no defect exists, the following step 7) is executed.
3) And extracting the outline of the defective initial document image to obtain a target document image.
4) Extracting image texture features of the railway service equipment maintenance record book image through a Gabor filter, carrying out defect classification on the railway service equipment maintenance record book image through a CART classification model, and carrying out defect repair on the target document image according to classification.
Wherein, the railway service equipment maintenance record book comprises a plurality of tables. Depending on the defect, various categories may be classified, such as underprint, scratch, mottle, etc.
The enhancement of the images of the maintenance record book of the railway service equipment by the Gabor filter model can improve the images of the highlight area and the dark area and enhance the contrast ratio of characters and paper background.
5) And registering the repair image and the form template by adopting the FFD model to obtain a registered image.
The purpose of the registration is, among other things: aiming at the problems of paper deflection, dislocation and the like possibly generated in the scanning process, different scanned images are rotated, translated and scaled to the same scale and position, so that the accuracy of the subsequent recognition step is enhanced.
6) And extracting the balance degree and contrast characteristic of the registration image, and processing the registration image by utilizing the enhancement model to obtain the image to be segmented.
7) And carrying out semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images.
8) And identifying the segmented character image based on a character identification model to obtain a corresponding identification character.
And forming a handwriting character data set special for the railway based on a pre-stored railway common dictionary, and training the character recognition model to obtain a trained character recognition model.
9) Based on the positions of the respective recognition characters in the document image, a document recognition result is obtained as shown in fig. 4.
The image recognition apparatus provided in the embodiment of the present invention will be described below, and the image recognition apparatus described below and the image recognition method described above may be referred to correspondingly.
The embodiment of the invention discloses an image recognition device, as shown in fig. 5, comprising:
the preprocessing module 501 is configured to preprocess the target document image to obtain a preprocessed image to be segmented;
the semantic segmentation module 502 is configured to perform semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images;
a character recognition module 503, configured to recognize the segmented character image based on a character recognition model, to obtain a corresponding recognition character;
a document generating module 504 for obtaining a document recognition result based on the positions of the respective recognition characters in the document image.
Optionally, the preprocessing module 501 includes:
the repairing unit is used for extracting the image texture characteristics of the target document image, carrying out defect classification on the target document image through the classification model, and carrying out defect repairing on the target document image according to the classification;
the registration unit is used for registering the repair image and the form template by adopting the registration model to obtain a registration image;
and the enhancement unit is used for extracting the balance degree and contrast characteristic of the registration image, and processing the registration image by utilizing the enhancement model to obtain the image to be segmented.
Optionally, the enhancing unit is specifically configured to: and extracting the balance degree characteristic and the contrast characteristic of the registration image, and processing the registration image by utilizing the enhancement model under the condition that the balance degree is smaller than a first threshold value or the contrast is smaller than a second threshold value to obtain the image to be segmented.
Optionally, the registration model comprises a plurality of B-spline basis functions;
the registration unit is specifically configured to: extracting interesting feature points of the repair image; and processing based on the interesting feature points, the table template and the B-spline basis function to obtain the registration image.
Optionally, the semantic segmentation module 502 is specifically configured to:
determining the position and arrangement relation of each cell in a table and cutting the cells when the image to be segmented comprises the table;
carrying out semantic segmentation on each cell through a semantic segmentation model to obtain a plurality of character region images;
and carrying out contour recognition on the character area images to obtain a plurality of segmentation character images in each character area image.
Optionally, the apparatus further comprises: and the training module is used for training the character recognition model based on a pre-stored character data set to obtain a trained character recognition model.
Optionally, the apparatus further comprises: the defect detection module is used for obtaining an initial document image through scanning; extracting texture features of the initial document image, and performing defect detection through the texture features of the initial document image; and extracting the outline of the initial document image with the defects to obtain the target document image.
According to the image recognition device provided by the embodiment of the invention, the target document image is preprocessed to obtain the preprocessed image to be segmented, the contrast between characters and the image is enhanced, and then the image to be segmented is subjected to semantic segmentation through a semantic segmentation model to obtain a plurality of segmented character images; identifying the segmented character image based on the character identification model to obtain a corresponding identification character; based on the positions of the identification characters in the document image, a document identification result is obtained, so that the characters in the target document image can be automatically identified, and the problems of high manual input workload, low efficiency and high error rate are solved.
Fig. 6 illustrates a physical schematic diagram of an electronic device, as shown in fig. 6, which may include: processor 610, communication interface (Communications Interface) 620, memory 630, and communication bus 640, wherein processor 610, communication interface 620, and memory 630 communicate with each other via communication bus 640. The processor 610 may invoke logic instructions in the memory 630 to perform methods of image recognition, including:
preprocessing a target document image to obtain a preprocessed image to be segmented;
carrying out semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images;
identifying the segmented character image based on a character identification model to obtain a corresponding identification character;
based on the positions of the respective recognition characters in the document image, a document recognition result is obtained.
Further, the logic instructions in the memory 630 may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, embodiments of the present invention also provide a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the method of image recognition provided by the above-described method embodiments, comprising:
preprocessing a target document image to obtain a preprocessed image to be segmented;
carrying out semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images;
identifying the segmented character image based on a character identification model to obtain a corresponding identification character;
based on the positions of the respective recognition characters in the document image, a document recognition result is obtained.
In yet another aspect, embodiments of the present invention further provide a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the method of image recognition provided by the above embodiments, comprising:
preprocessing a target document image to obtain a preprocessed image to be segmented;
carrying out semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images;
identifying the segmented character image based on a character identification model to obtain a corresponding identification character;
based on the positions of the respective recognition characters in the document image, a document recognition result is obtained.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (9)

1. A method of image recognition, comprising:
preprocessing a target document image to obtain a preprocessed image to be segmented;
carrying out semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images;
identifying the segmented character image based on a character identification model to obtain a corresponding identification character;
obtaining a document recognition result based on the positions of the recognition characters in the document image;
preprocessing the target document image to obtain a preprocessed image to be segmented, wherein the preprocessing comprises the following steps:
repairing: extracting image texture characteristics of a target document image, carrying out defect classification on the target document image through a classification model, and carrying out defect repair on the target document image according to classification;
registering: registering the repair image and the form template by adopting a registration model to obtain a registration image;
enhancement: and extracting the balance degree characteristic and the contrast characteristic of the registration image, and processing the registration image by utilizing the enhancement model to obtain the image to be segmented.
2. The method of image recognition according to claim 1, wherein extracting the balance features and contrast features of the registered image and processing the registered image with the enhancement model comprises:
and extracting the balance degree characteristic and the contrast characteristic of the registration image, and processing the registration image by using the enhancement model under the condition that the balance degree is smaller than a first threshold value or the contrast is smaller than a second threshold value.
3. The method of image recognition according to claim 1, wherein the registration model comprises a plurality of B-spline basis functions;
registering the repair image and the form template by adopting a registration model to obtain a registration image, wherein the registering comprises the following steps:
extracting interesting feature points of the repair image;
and processing based on the interesting feature points, the table template and the B-spline basis function to obtain the registration image.
4. The method of image recognition according to claim 1, wherein semantically segmenting the image to be segmented by a semantic segmentation model to obtain a plurality of segmented character images, comprising:
determining the position and arrangement relation of each cell in a table and cutting the cells when the image to be segmented comprises the table;
carrying out semantic segmentation on each cell through a semantic segmentation model to obtain a plurality of character region images;
and carrying out contour recognition on the character area images to obtain a plurality of segmentation character images in each character area image.
5. The method of image recognition according to claim 1, wherein prior to the recognition of the segmented character image based on a character recognition model, the method further comprises:
training the character recognition model based on a pre-stored text data set to obtain a trained character recognition model.
6. The method of image recognition according to claim 1, wherein prior to preprocessing the document image, the method further comprises:
obtaining an initial document image by scanning;
extracting texture features of the initial document image, and performing defect detection through the texture features of the initial document image;
and extracting the outline of the initial document image with the defects to obtain the target document image.
7. An apparatus for image recognition, comprising:
the preprocessing module is used for preprocessing the target document image to obtain a preprocessed image to be segmented;
the semantic segmentation module is used for carrying out semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images;
the character recognition module is used for recognizing the segmented character image based on a character recognition model to obtain a corresponding recognition character;
a document generation module for obtaining a document recognition result based on the positions of the respective recognition characters in the document image;
the preprocessing module comprises:
the repairing unit is used for extracting the image texture characteristics of the target document image, carrying out defect classification on the target document image through the classification model, and carrying out defect repairing on the target document image according to the classification;
the registration unit is used for registering the repair image and the form template by adopting the registration model to obtain a registration image;
and the enhancement unit is used for extracting the balance degree and contrast characteristic of the registration image, and processing the registration image by utilizing the enhancement model to obtain the image to be segmented.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of image recognition according to any one of claims 1 to 6 when the program is executed.
9. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of image recognition according to any one of claims 1 to 6.
CN202011108746.6A 2020-10-16 2020-10-16 Image recognition method and device, electronic equipment and storage medium Active CN112200789B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011108746.6A CN112200789B (en) 2020-10-16 2020-10-16 Image recognition method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011108746.6A CN112200789B (en) 2020-10-16 2020-10-16 Image recognition method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112200789A CN112200789A (en) 2021-01-08
CN112200789B true CN112200789B (en) 2023-11-21

Family

ID=74010472

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011108746.6A Active CN112200789B (en) 2020-10-16 2020-10-16 Image recognition method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112200789B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112949514A (en) * 2021-03-09 2021-06-11 广州文石信息科技有限公司 Scanned document information processing method and device, electronic equipment and storage medium
CN113407783A (en) * 2021-05-13 2021-09-17 中车太原机车车辆有限公司 Electric locomotive overhaul record management system

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104615974A (en) * 2015-01-15 2015-05-13 成都交大光芒科技股份有限公司 Continuous supporting pole number plate image recognizing method based on tracking algorithm
CN106844767A (en) * 2017-02-23 2017-06-13 中国科学院自动化研究所 Format file block information key registration and the method and device extracted
CN107220965A (en) * 2017-05-05 2017-09-29 上海联影医疗科技有限公司 A kind of image partition method and system
WO2018040118A1 (en) * 2016-08-29 2018-03-08 武汉精测电子集团股份有限公司 Gpu-based tft-lcd mura defect detection method
CN108345881A (en) * 2018-02-01 2018-07-31 福州大学 A kind of document quality detection method based on computer vision
CN108399610A (en) * 2018-03-20 2018-08-14 上海应用技术大学 A kind of depth image enhancement method of fusion RGB image information
CN109460769A (en) * 2018-11-16 2019-03-12 湖南大学 A kind of mobile end system and method based on table character machining and identification
CN109583449A (en) * 2018-10-29 2019-04-05 深圳市华尊科技股份有限公司 Character identifying method and Related product
CN109829906A (en) * 2019-01-31 2019-05-31 桂林电子科技大学 It is a kind of based on the workpiece, defect of the field of direction and textural characteristics detection and classification method
CN110390650A (en) * 2019-07-23 2019-10-29 中南大学 OCT image denoising method based on intensive connection and generation confrontation network
CN110796031A (en) * 2019-10-11 2020-02-14 腾讯科技(深圳)有限公司 Table identification method and device based on artificial intelligence and electronic equipment
CN111507214A (en) * 2020-04-07 2020-08-07 中国人民财产保险股份有限公司 Document identification method, device and equipment
CN111625621A (en) * 2020-04-27 2020-09-04 中国铁道科学研究院集团有限公司电子计算技术研究所 Document retrieval method and device, electronic equipment and storage medium
CN111723807A (en) * 2019-03-20 2020-09-29 Sap欧洲公司 Recognizing machine-typed and handwritten characters using end-to-end deep learning

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11195057B2 (en) * 2014-03-18 2021-12-07 Z Advanced Computing, Inc. System and method for extremely efficient image and pattern recognition and artificial intelligence platform
US10482604B2 (en) * 2017-05-05 2019-11-19 Shanghai United Imaging Healthcare Co., Ltd. Systems and methods for image processing

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104615974A (en) * 2015-01-15 2015-05-13 成都交大光芒科技股份有限公司 Continuous supporting pole number plate image recognizing method based on tracking algorithm
WO2018040118A1 (en) * 2016-08-29 2018-03-08 武汉精测电子集团股份有限公司 Gpu-based tft-lcd mura defect detection method
CN106844767A (en) * 2017-02-23 2017-06-13 中国科学院自动化研究所 Format file block information key registration and the method and device extracted
CN107220965A (en) * 2017-05-05 2017-09-29 上海联影医疗科技有限公司 A kind of image partition method and system
CN108345881A (en) * 2018-02-01 2018-07-31 福州大学 A kind of document quality detection method based on computer vision
CN108399610A (en) * 2018-03-20 2018-08-14 上海应用技术大学 A kind of depth image enhancement method of fusion RGB image information
CN109583449A (en) * 2018-10-29 2019-04-05 深圳市华尊科技股份有限公司 Character identifying method and Related product
CN109460769A (en) * 2018-11-16 2019-03-12 湖南大学 A kind of mobile end system and method based on table character machining and identification
CN109829906A (en) * 2019-01-31 2019-05-31 桂林电子科技大学 It is a kind of based on the workpiece, defect of the field of direction and textural characteristics detection and classification method
CN111723807A (en) * 2019-03-20 2020-09-29 Sap欧洲公司 Recognizing machine-typed and handwritten characters using end-to-end deep learning
CN110390650A (en) * 2019-07-23 2019-10-29 中南大学 OCT image denoising method based on intensive connection and generation confrontation network
CN110796031A (en) * 2019-10-11 2020-02-14 腾讯科技(深圳)有限公司 Table identification method and device based on artificial intelligence and electronic equipment
CN111507214A (en) * 2020-04-07 2020-08-07 中国人民财产保险股份有限公司 Document identification method, device and equipment
CN111625621A (en) * 2020-04-27 2020-09-04 中国铁道科学研究院集团有限公司电子计算技术研究所 Document retrieval method and device, electronic equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Efficient computation offloading for Internet of Vehicles in edge computing-assisted 5G networks;Shaohua Wan等;The Journal of Supercomputing;第2518-2547页 *
基于纹理图像的计算机视觉识别算法研究;范苏苏;中国优秀硕士学位论文数据库信息科技辑;第I138-336页 *
项目计算机辅助受理的研究方向与关键词――2012年度受理情况与2013年度注意事项;马惠珠;宋朝晖;季飞;侯嘉;熊小芸;;电子与信息学报(第01期);全文 *

Also Published As

Publication number Publication date
CN112200789A (en) 2021-01-08

Similar Documents

Publication Publication Date Title
CN110298376B (en) Bank bill image classification method based on improved B-CNN
CN109740606B (en) Image identification method and device
CN102054271B (en) Text line detection method and device
CN108197644A (en) A kind of image-recognizing method and device
CN111680690B (en) Character recognition method and device
CN111353491B (en) Text direction determining method, device, equipment and storage medium
CN110490190B (en) Structured image character recognition method and system
CN112200789B (en) Image recognition method and device, electronic equipment and storage medium
CN115239644B (en) Concrete defect identification method, device, computer equipment and storage medium
CN113240623B (en) Pavement disease detection method and device
CN106529407A (en) Vehicle-mounted fingerprint recognition method
CN111507344A (en) Method and device for recognizing characters from image
CN114862861B (en) Lung lobe segmentation method and device based on few-sample learning
Yadav et al. A robust approach for offline English character recognition
CN111241897B (en) System and implementation method for digitizing industrial inspection sheets by inferring visual relationships
CN111738979A (en) Automatic certificate image quality inspection method and system
CN116912865A (en) Form image recognition method, device, equipment and medium
CN113516193B (en) Image processing-based red date defect identification and classification method and device
CN115937095A (en) Printing defect detection method and system integrating image processing algorithm and deep learning
CN112766082B (en) Chinese text handwriting identification method and device based on macro-micro characteristics and storage medium
CN114241463A (en) Signature verification method and device, computer equipment and storage medium
CN114677552A (en) Fingerprint detail database labeling method and system for deep learning
Singh et al. Performance analysis of thinning algorithms for offline-handwritten Devanagari words
CN114463767A (en) Credit card identification method, device, computer equipment and storage medium
Araújo et al. Segmenting and recognizing license plate characters

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant