CN112200789B

CN112200789B - Image recognition method and device, electronic equipment and storage medium

Info

Publication number: CN112200789B
Application number: CN202011108746.6A
Authority: CN
Inventors: 程智博; 赵正阳; 栾中; 吴艳华; 刘军; 邵赛
Original assignee: China Academy of Railway Sciences Corp Ltd CARS; Institute of Computing Technologies of CARS; Beijing Jingwei Information Technology Co Ltd
Current assignee: China Academy of Railway Sciences Corp Ltd CARS; Institute of Computing Technologies of CARS; Beijing Jingwei Information Technology Co Ltd
Priority date: 2020-10-16
Filing date: 2020-10-16
Publication date: 2023-11-21
Anticipated expiration: 2040-10-16
Also published as: CN112200789A

Abstract

The embodiment of the invention provides a method and a device for image recognition, electronic equipment and a storage medium, wherein the method comprises the following steps: preprocessing a target document image to obtain a preprocessed image to be segmented; carrying out semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images; identifying the segmented character image based on a character identification model to obtain a corresponding identification character; based on the positions of the identification characters in the document image, a document identification result is obtained, so that the characters in the target document image can be automatically identified, and the problems of high manual input workload, low efficiency and high error rate are solved.

Description

Image recognition method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of neural networks, and in particular, to a method and apparatus for image recognition, an electronic device, and a storage medium.

Background

In recent years, with the gradual penetration of railway informatization construction, the coverage of a system is wider and wider, and the accumulated data volume is larger and larger. The inspection data of the railway working equipment maintenance record book has great significance for equipment maintenance data analysis.

Because the manual inspection record book records inspection contents in a plurality of table forms, chinese characters, numbers and other contents are scattered in different areas of the table, the phenomena of damage, aging and the like exist for a part of the record book for a long time, and the record writing of some field inspection personnel is illegal and writing is not standard, so that the extraction difficulty of the record book information is greatly increased.

Manual checking of a log book generally uses a form of complex format, and for the complexity and difficulty of such forms, there is no reliable technique capable of quickly and accurately extracting the required information from the log book.

Disclosure of Invention

The embodiment of the invention provides an image recognition method and device, electronic equipment and a storage medium, which are used for solving the defect that required information cannot be extracted from a record book rapidly and accurately in the prior art.

The embodiment of the invention provides an image identification method, which comprises the following steps:

preprocessing a target document image to obtain a preprocessed image to be segmented;

carrying out semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images;

identifying the segmented character image based on a character identification model to obtain a corresponding identification character;

based on the positions of the respective recognition characters in the document image, a document recognition result is obtained.

According to the image recognition method of one embodiment of the invention, preprocessing is carried out on the target document image to obtain a preprocessed image to be segmented, and the method comprises the following steps:

repairing: extracting image texture characteristics of a target document image, carrying out defect classification on the target document image through a classification model, and carrying out defect repair on the target document image according to classification;

registering: registering the repair image and the form template by adopting a registration model to obtain a registration image;

enhancement: and extracting the balance degree and contrast characteristic of the registration image, and processing the registration image by utilizing the enhancement model to obtain the image to be segmented.

According to a method of image recognition of an embodiment of the present invention, extracting a balance feature and a contrast feature of a registered image, and processing the registered image using an enhancement model, includes:

and extracting the balance degree characteristic and the contrast characteristic of the registration image, and processing the registration image by using the enhancement model under the condition that the balance degree is smaller than a first threshold value or the contrast is smaller than a second threshold value.

According to a method of image recognition of an embodiment of the present invention, the registration model includes a plurality of B-spline basis functions;

registering the repair image and the form template by adopting a registration model to obtain a registration image, wherein the registering comprises the following steps:

extracting interesting feature points of the repair image;

and processing based on the interesting feature points, the table template and the B-spline basis function to obtain the registration image.

According to the image recognition method of one embodiment of the present invention, semantic segmentation is performed on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images, including:

determining the position and arrangement relation of each cell in a table and cutting the cells when the image to be segmented comprises the table;

carrying out semantic segmentation on each cell through a semantic segmentation model to obtain a plurality of character region images;

and carrying out contour recognition on the character area images to obtain a plurality of segmentation character images in each character area image.

According to a method of image recognition of an embodiment of the present invention, before the recognition of the segmented character image based on the character recognition model, the method further includes:

training the character recognition model based on a pre-stored text data set to obtain a trained character recognition model.

According to a method of image recognition according to an embodiment of the present invention, before preprocessing a document image, the method further includes:

obtaining an initial document image by scanning;

extracting texture features of the initial document image, and performing defect detection through the texture features of the initial document image;

and extracting the outline of the initial document image with the defects to obtain the target document image.

The embodiment of the invention also provides an image recognition device, which comprises:

the preprocessing module is used for preprocessing the target document image to obtain a preprocessed image to be segmented;

the semantic segmentation module is used for carrying out semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images;

the character recognition module is used for recognizing the segmented character image based on a character recognition model to obtain a corresponding recognition character;

and the document generation module is used for obtaining a document recognition result based on the positions of the recognition characters in the document image.

The embodiment of the invention also provides electronic equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps of the image identification method according to any one of the above when executing the program.

Embodiments of the present invention also provide a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of a method of image recognition as described in any of the above.

According to the image recognition method and device provided by the embodiment of the invention, the target document image is preprocessed to obtain the preprocessed image to be segmented, the contrast between characters and the image is enhanced, and then the image to be segmented is subjected to semantic segmentation through a semantic segmentation model to obtain a plurality of segmented character images; identifying the segmented character image based on the character identification model to obtain a corresponding identification character; based on the positions of the identification characters in the document image, a document identification result is obtained, so that the characters in the target document image can be automatically identified, and the problems of high manual input workload, low efficiency and high error rate are solved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a method for image recognition according to an embodiment of the present invention;

FIG. 2 is a flow chart of another method for image recognition according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a filter provided by an embodiment of the present invention;

FIG. 4 is a schematic diagram of a model structure for image recognition according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an apparatus for image recognition according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The terminology used in the one or more embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the invention. As used in one or more embodiments of the invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present invention refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that, although the terms first, second, etc. may be used in one or more embodiments of the invention to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of one or more embodiments of the invention. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

In the embodiments of the present invention, a method and apparatus for image recognition, an electronic device, and a non-transitory computer readable storage medium are provided, and detailed descriptions are given in the following embodiments.

First, terms related to the method of the embodiment of the invention will be explained.

Classification regression tree (Classification and Regression Tree, CART) model: decision trees are processes that use a structure similar to a tree to represent class partitioning, the construction of the tree can be seen as a variable (attribute) selection process, the internal nodes represent the tree to select those variables (attributes) as partitions, the leaf nodes of each tree represent the labels of a class, and the top-most layer of the tree is the root node. When the dependent variable of the data set is a continuity value, the tree algorithm is a regression tree, and the average value observed by the leaf nodes can be used as a predicted value; when the dependent variable of the data set is a discrete numerical value, the tree algorithm is a classification tree, so that the classification problem can be well solved. It should be noted that the algorithm is a binary tree, i.e. each non-leaf node can only extend two branches, so that when a certain non-leaf node is a multi-level (more than 2) discrete variable, it is possible that the variable is used multiple times.

Free-Form Deformation model (FFD): the algorithm of the FFD model mainly includes two steps: embedding the object model in a frame; when the control point position changes, the frame will "pull" the model, thus effecting deformation. Specifically, the method comprises the following steps: 1) A local coordinate system STU is constructed and then the local coordinates (s, t, u) corresponding to each vertex coordinate of the model are calculated. The local coordinates (s, t, u) are fixed regardless of the change in world coordinates of the control point; 2) Moving the control points, and recalculating the world coordinates of each vertex of the model by using the model vertex local coordinates (s, t, u), the control point world coordinates and Bernstein polynomials.

Gabor filter: the basic idea of Gabor transformation is to divide the signal into a number of small time intervals, each of which is analysed by fourier transformation in order to determine the frequency at which the signal is present. The processing method is to add a sliding window to f (t) and then carry out Fourier transform. The two-dimensional Gabor filter formed by Gabor functions has a characteristic of simultaneously achieving optimal localization in a spatial domain and a frequency domain, and thus local structure information corresponding to spatial frequency (scale), spatial position, and directional selectivity can be well described.

The embodiment of the invention discloses a method for identifying images, which is shown in fig. 1 and comprises the following steps of 101-104:

101. and preprocessing the target document image to obtain a preprocessed image to be segmented.

The target document image may be obtained in various manners, for example, photographing or scanning a paper document to generate a corresponding document image.

After the document image is obtained, not all the document images need to be identified by the method, and the subsequent image segmentation step is directly executed on the document with clear handwriting and easy identification; for a document with a bad handwriting, irregular writing and blurred handwriting due to the past age, the image to be segmented is obtained after preprocessing, and then the subsequent segmentation step is executed.

102. And carrying out semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images.

In this embodiment, the semantic segmentation model may be a convolutional neural network (Convolutional Neural Networks, CNN) model or the like.

In this embodiment, semantic segmentation is performed on each region through a CNN model to obtain a binarized region image only containing regions of characters (numerals, chinese characters, letters and punctuations), interference of background and noise is eliminated, and accurate character region positions are obtained through contour search and a plurality of segmented character images are obtained.

It should be noted that the image containing the character is obtained by the semantic segmentation model, not the character itself. The segmented character is further obtained by eliminating the interference and contour search. For example, a character "2020" to be segmented, and obtaining a region image containing the "2020" through a semantic segmentation model, wherein the region image can be a square image. And then, carrying out background and noise interference elimination and contour search on the area image to obtain character images of '2', '0', respectively.

103. And identifying the segmented character image based on a character identification model to obtain a corresponding identification character.

In this embodiment, a character recognition model needs to be built in combination with the corresponding data set.

For example, for the recognition of the railway service equipment maintenance record book image, a railway special handwritten text data set is required to be formed by combining a railway common dictionary so as to train a character recognition model.

Also for example for image recognition of financial data books, it is necessary to form a financial dedicated handwritten text data set in combination with a common dictionary of finances to train a character recognition model.

104. Based on the positions of the respective recognition characters in the document image, a document recognition result is obtained.

Specifically, after each recognition character is determined, each recognition character is placed at a position in the corresponding document image, so that a document recognition result can be obtained.

Taking a table document as an example, after the characters can be identified in step 104, the identified characters are placed at the corresponding cells based on the cells corresponding to each identified character, and finally, the document identification result is obtained.

According to the image recognition method provided by the embodiment of the invention, the target document image is preprocessed to obtain the preprocessed image to be segmented, the contrast between characters and the image is enhanced, and then the image to be segmented is subjected to semantic segmentation through a semantic segmentation model to obtain a plurality of segmented character images; identifying the segmented character image based on the character identification model to obtain a corresponding identification character; based on the positions of the identification characters in the document image, a document identification result is obtained, so that the characters in the target document image can be automatically identified, and the problems of high manual input workload, low efficiency and high error rate are solved.

The embodiment of the invention discloses a method for image recognition, which is used for schematically describing the method of the embodiment, and referring to fig. 2, the method comprises the following steps:

201. an initial document image is obtained by scanning.

In this embodiment, the original document image is generated unstructured by batch scanning to manually examine the album.

202. And extracting texture features of the initial document image, and detecting defects through the texture features of the initial document image.

And detecting defects such as missing printing, scratching, ink spots and the like of the document image by using image texture features, and extracting the outline by using a local dynamic threshold segmentation method to obtain a defect image.

203. And extracting the outline of the initial document image with the defects to obtain the target document image.

After the target document image is obtained, as not all the target document images need to be preprocessed, preprocessing is not needed for document images with clear handwriting and convenient recognition, and the subsequent segmentation step is directly carried out; the document image with blurred writing and difficult recognition needs to be preprocessed.

The following steps 204-206 are three steps of preprocessing: classification, registration and enhancement to enable preprocessing of the target document image.

204. Extracting image texture characteristics of the target document image, carrying out defect classification on the target document image through a classification model, and carrying out defect repair on the target document image according to classification.

In this step, image texture features of the target document image are extracted by a Gabor filter.

Filter templates of different dimensions and directions are generated by the following formulas (1) to (3):

where x '=xcoosθ+ysinθ, y' = -xcoosθ+ycoosθ; λ is a sine function wavelength, which can be understood as the scale; θ is the Gabor kernel direction; psi refers to the phase offset; sigma refers to the standard deviation of the gaussian function. By varying λ and θ, a total of 15 filters for three angles and five directions were obtained, as shown in fig. 3.

In order to extract the texture features of the image, in this embodiment, each filter in the filter bank is convolved with the target document image, respectively, to obtain a response value map corresponding to the filter. And calculating a filter serial number of the maximum response value obtained at each pixel point. A normalized histogram is then built up in the image local block as an image texture feature for that image block.

In this embodiment, the Gabor filter is used to enhance a large number of handwritten characters in the image, improve the images of the highlight region and the dark region, and enhance the contrast between the characters and the paper background.

In this step, a classification model is schematically described as an example of a CART model. The CART model is a binary tree model, and different defects are classified and judged through the CART model, and the defects are repaired according to classification in the subsequent step.

205. And registering the repair image and the form template by adopting a registration model to obtain a registration image.

Aiming at the problems of paper deflection, dislocation and the like possibly generated in the scanning process, different scanned images are rotated, translated and scaled to the same scale and position, so that the accuracy of the subsequent recognition step is enhanced.

In this step, an FFD model is taken as an example of the registration model.

Specifically, the FFD model includes a plurality of B-spline basis functions, and step 205 includes: extracting interesting feature points of the repair image; and processing based on the interesting feature points, the form template and the B-spline basis function to obtain a registration image.

The point feature is a common feature contained in the scanned image, and the feature point of interest is searched by adopting a region-based similarity strategy. Let p be _i Is a feature point to be extracted in the reference image R, w _r Is p is _i M x n window as center, w _f Is a window of corresponding position u x v in the target image, when w _f And w is equal to _r When taking the maximum under some similarity measure function, window w _f Center q of (2) _i Namely p _i Corresponding feature points of interest.

A non-rigid registration algorithm based on the FFD model is then employed with the extracted feature points of interest. Let Ω= { (X, Y) |0 be less than or equal to X < X,0 be less than or equal to Y < Y } denote the image to be registered, Φ denote n overlaid on Ω _x ×n _y A uniform control point grid, wherein the (i, j) th control point is marked as phi _i，j ，δ _x ，δ _y Representing the grid spacing in the X-axis and Y-axis directions, respectively, the FFD model is given in the form of a tensor product of a one-dimensional 3-degree B-spline basis function, defined as the following formula (4):

wherein, the finger rounding operation, the B spline basis functions are respectively as follows: /> 0≤u＜1。

Let phi ₀ ，Φ ₁ ，Φ ₂ ，...Φ _K Representing a K+1th layer control grid, assuming one increment of the control setpoint from K to K+1th layer, the FFD model will be written in the form of a combination of multiple layers of submodels as shown in equation (5) below:

next, obtaining the deformation function T of each layer by solving control vertexes of each layer of grids _loc (x, y) repeating this process for all layers and obtaining the final deformation function by resampling. And finally, the mutual information is used as a similarity measurement function, and the registration of the document images can be scanned better by using a gradient optimization strategy.

206. And extracting the balance degree characteristic and the contrast characteristic of the registration image, and processing the registration image by utilizing the enhancement model to obtain the image to be segmented.

Specifically, step 206 includes: and extracting the balance degree characteristic and the contrast degree characteristic of the registered image, and executing the step of processing the registered image by using the enhancement model under the condition that the balance degree is smaller than a first threshold value or the contrast degree is smaller than a second threshold value so as to improve the visual effect of the image.

The balance degree characteristic is a gray level statistical characteristic and is characterized by adopting a histogram. The gray level histogram is a function of the gray level distribution and is a statistic of the gray level distribution in the image. The gray level histogram is to count the occurrence frequency of all pixels in the digital image according to the gray level value. A gray histogram is a function of gray levels and represents the number of pixels in an image that have a certain gray level, reflecting the frequency at which a certain gray level appears in the image. The uniformity of the gray histogram distribution can reflect important statistical information of the image, and the feature is defined by the formula (6):

wherein F is _his The uniformity of the gray level histogram distribution;

m is the pixel bit width, typically 11 in medical images;

n is the number of pixels in the super pixel block;

is the channel in the super-pixel block, c is the pixel value, and v is the number of pixels.

For contrast features, weber contrast features are employed.

Weber's law (sensory threshold law) states that under the same stimulus, the dynamic range of the stimulus perceived by a person is proportional to the intensity of the standard stimulus, and the weber contrast is defined as applied to the visual stimulus of a person:wherein I is the brightness of the object, I _b Is the overall brightness of the background.

In practical applications, it is often difficult to point out objects and backgrounds, so this embodiment writes this feature as the following formula (7):

wherein F is _web In order to be a weber contrast feature,is the gray average value of super pixel block, I ^c (p) is each pixel value in the super pixel block; n is the number of pixels in the current window; p is the pixel position, corresponding to the combination of x and y; b refers to the current window (image block), meaning when the pixel is not known to be within the current window.

207. And carrying out semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images.

Specifically, step 207 includes the following steps S271 to S273:

s271, in the case that the image to be segmented comprises a table, determining the position and the arrangement relation of each cell in the table, and segmenting the cells.

And S272, performing semantic segmentation on each cell through a semantic segmentation model to obtain a plurality of character region images.

In this embodiment, the character area image may be a cell image including the character.

S273, carrying out contour recognition on the character area images to obtain a plurality of segmentation character images in each character area image.

Note that the divided character image is not identical to the character area image, and the character area image includes a plurality of divided character images. Taking the character "1995" as an example, the character area image is a cell area image including the character "1995", and the divided character images are images corresponding to four characters of "1", "9", "5", and the like, respectively.

208. And identifying the segmented character image based on a character identification model to obtain a corresponding identification character.

In this embodiment, the character recognition model may be a CNN model.

Before the segmented character image is identified based on the character recognition model, training the character recognition model based on a pre-stored character data set is needed to obtain a trained character recognition model.

209. Based on the positions of the respective recognition characters in the document image, a document recognition result is obtained.

Taking the original target document image as a table image as an example, the corresponding cells of which the finally obtained document identification result is a table are replaced by images after character identification.

Taking a railway service equipment maintenance record book as an example, the embodiment of the invention is schematically described, and the embodiment is shown in fig. 4. An end-to-end model was constructed in the embodiment of fig. 4, comprising: CART model, FFD model, gabor filter model, semantic segmentation model, and word recognition model.

The method comprises the following steps:

1) And scanning the maintenance record book of the railway working equipment to obtain an initial document image.

2) And extracting texture features of the initial document image, and detecting defects through the texture features of the initial document image.

If no defect exists, the following step 7) is executed.

3) And extracting the outline of the defective initial document image to obtain a target document image.

4) Extracting image texture features of the railway service equipment maintenance record book image through a Gabor filter, carrying out defect classification on the railway service equipment maintenance record book image through a CART classification model, and carrying out defect repair on the target document image according to classification.

Wherein, the railway service equipment maintenance record book comprises a plurality of tables. Depending on the defect, various categories may be classified, such as underprint, scratch, mottle, etc.

The enhancement of the images of the maintenance record book of the railway service equipment by the Gabor filter model can improve the images of the highlight area and the dark area and enhance the contrast ratio of characters and paper background.

5) And registering the repair image and the form template by adopting the FFD model to obtain a registered image.

The purpose of the registration is, among other things: aiming at the problems of paper deflection, dislocation and the like possibly generated in the scanning process, different scanned images are rotated, translated and scaled to the same scale and position, so that the accuracy of the subsequent recognition step is enhanced.

6) And extracting the balance degree and contrast characteristic of the registration image, and processing the registration image by utilizing the enhancement model to obtain the image to be segmented.

7) And carrying out semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images.

8) And identifying the segmented character image based on a character identification model to obtain a corresponding identification character.

And forming a handwriting character data set special for the railway based on a pre-stored railway common dictionary, and training the character recognition model to obtain a trained character recognition model.

9) Based on the positions of the respective recognition characters in the document image, a document recognition result is obtained as shown in fig. 4.

The image recognition apparatus provided in the embodiment of the present invention will be described below, and the image recognition apparatus described below and the image recognition method described above may be referred to correspondingly.

The embodiment of the invention discloses an image recognition device, as shown in fig. 5, comprising:

the preprocessing module 501 is configured to preprocess the target document image to obtain a preprocessed image to be segmented;

the semantic segmentation module 502 is configured to perform semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images;

a character recognition module 503, configured to recognize the segmented character image based on a character recognition model, to obtain a corresponding recognition character;

a document generating module 504 for obtaining a document recognition result based on the positions of the respective recognition characters in the document image.

Optionally, the preprocessing module 501 includes:

the repairing unit is used for extracting the image texture characteristics of the target document image, carrying out defect classification on the target document image through the classification model, and carrying out defect repairing on the target document image according to the classification;

the registration unit is used for registering the repair image and the form template by adopting the registration model to obtain a registration image;

and the enhancement unit is used for extracting the balance degree and contrast characteristic of the registration image, and processing the registration image by utilizing the enhancement model to obtain the image to be segmented.

Optionally, the enhancing unit is specifically configured to: and extracting the balance degree characteristic and the contrast characteristic of the registration image, and processing the registration image by utilizing the enhancement model under the condition that the balance degree is smaller than a first threshold value or the contrast is smaller than a second threshold value to obtain the image to be segmented.

Optionally, the registration model comprises a plurality of B-spline basis functions;

the registration unit is specifically configured to: extracting interesting feature points of the repair image; and processing based on the interesting feature points, the table template and the B-spline basis function to obtain the registration image.

Optionally, the semantic segmentation module 502 is specifically configured to:

Optionally, the apparatus further comprises: and the training module is used for training the character recognition model based on a pre-stored character data set to obtain a trained character recognition model.

Optionally, the apparatus further comprises: the defect detection module is used for obtaining an initial document image through scanning; extracting texture features of the initial document image, and performing defect detection through the texture features of the initial document image; and extracting the outline of the initial document image with the defects to obtain the target document image.

According to the image recognition device provided by the embodiment of the invention, the target document image is preprocessed to obtain the preprocessed image to be segmented, the contrast between characters and the image is enhanced, and then the image to be segmented is subjected to semantic segmentation through a semantic segmentation model to obtain a plurality of segmented character images; identifying the segmented character image based on the character identification model to obtain a corresponding identification character; based on the positions of the identification characters in the document image, a document identification result is obtained, so that the characters in the target document image can be automatically identified, and the problems of high manual input workload, low efficiency and high error rate are solved.

Fig. 6 illustrates a physical schematic diagram of an electronic device, as shown in fig. 6, which may include: processor 610, communication interface (Communications Interface) 620, memory 630, and communication bus 640, wherein processor 610, communication interface 620, and memory 630 communicate with each other via communication bus 640. The processor 610 may invoke logic instructions in the memory 630 to perform methods of image recognition, including:

Further, the logic instructions in the memory 630 may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, embodiments of the present invention also provide a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the method of image recognition provided by the above-described method embodiments, comprising:

In yet another aspect, embodiments of the present invention further provide a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the method of image recognition provided by the above embodiments, comprising:

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method of image recognition, comprising:

obtaining a document recognition result based on the positions of the recognition characters in the document image;

preprocessing the target document image to obtain a preprocessed image to be segmented, wherein the preprocessing comprises the following steps:

enhancement: and extracting the balance degree characteristic and the contrast characteristic of the registration image, and processing the registration image by utilizing the enhancement model to obtain the image to be segmented.

2. The method of image recognition according to claim 1, wherein extracting the balance features and contrast features of the registered image and processing the registered image with the enhancement model comprises:

3. The method of image recognition according to claim 1, wherein the registration model comprises a plurality of B-spline basis functions;

extracting interesting feature points of the repair image;

4. The method of image recognition according to claim 1, wherein semantically segmenting the image to be segmented by a semantic segmentation model to obtain a plurality of segmented character images, comprising:

5. The method of image recognition according to claim 1, wherein prior to the recognition of the segmented character image based on a character recognition model, the method further comprises:

6. The method of image recognition according to claim 1, wherein prior to preprocessing the document image, the method further comprises:

obtaining an initial document image by scanning;

7. An apparatus for image recognition, comprising:

a document generation module for obtaining a document recognition result based on the positions of the respective recognition characters in the document image;

the preprocessing module comprises:

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of image recognition according to any one of claims 1 to 6 when the program is executed.

9. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of image recognition according to any one of claims 1 to 6.