CN112200789A

CN112200789A - Image identification method and device, electronic equipment and storage medium

Info

Publication number: CN112200789A
Application number: CN202011108746.6A
Authority: CN
Inventors: 程智博; 赵正阳; 栾中; 吴艳华; 刘军; 邵赛
Original assignee: China Academy of Railway Sciences Corp Ltd CARS; Institute of Computing Technologies of CARS; Beijing Jingwei Information Technology Co Ltd
Current assignee: China Academy of Railway Sciences Corp Ltd CARS; Institute of Computing Technologies of CARS; Beijing Jingwei Information Technology Co Ltd
Priority date: 2020-10-16
Filing date: 2020-10-16
Publication date: 2021-01-08
Anticipated expiration: 2040-10-16
Also published as: CN112200789B

Abstract

The embodiment of the invention provides an image identification method and device, electronic equipment and a storage medium, wherein the method comprises the following steps: preprocessing a target document image to obtain a preprocessed image to be segmented; performing semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images; identifying the segmented character image based on a character identification model to obtain a corresponding identification character; the document recognition result is obtained based on the position of each recognition character in the document image, so that the characters in the target document image can be automatically recognized, and the problems of large workload, low efficiency and high error rate of manual entry are solved.

Description

Image identification method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of neural network technologies, and in particular, to a method and an apparatus for image recognition, an electronic device, and a storage medium.

Background

In recent years, along with the gradual deepening of railway information construction, the system coverage is wider and wider, and the accumulated data volume is larger and larger. The inspection data of the railway engineering equipment overhaul record book has great significance for analyzing the equipment maintenance data.

Because the manual inspection note book is mostly with multiple form record inspection content, its contents such as chinese characters, digit scatter in the different regions of form, and some note books age has been for a long time, has phenomenons such as damaged and ageing, and some on-the-spot inspection personnel record the illegible handwriting, the writing is not enough normative, greatly increased the extraction degree of difficulty of note book information.

The manual check of the record book generally uses a table type with a complex format, and no reliable technology is available for quickly and accurately extracting the required information from the record book aiming at the complexity and difficulty of the table type.

Disclosure of Invention

Embodiments of the present invention provide an image recognition method and apparatus, an electronic device, and a storage medium, so as to solve a defect that required information cannot be extracted from a log quickly and accurately in the prior art.

The embodiment of the invention provides an image identification method, which comprises the following steps:

preprocessing a target document image to obtain a preprocessed image to be segmented;

performing semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images;

identifying the segmented character image based on a character identification model to obtain a corresponding identification character;

based on the positions of the respective recognition characters in the document image, a document recognition result is obtained.

According to the image recognition method of one embodiment of the invention, the target document image is preprocessed to obtain the preprocessed image to be segmented, and the image recognition method comprises the following steps:

repairing: extracting image texture features of the target document image, classifying defects of the target document image through a classification model, and repairing the defects of the target document image according to the classification;

and (3) registration: adopting a registration model to register the repaired image and the form template to obtain a registration image;

enhancing: and extracting the balance and contrast characteristics of the registered image, and processing the registered image by using an enhanced model to obtain the image to be segmented.

According to the image identification method of one embodiment of the invention, the balance characteristic and the contrast characteristic of the registration image are extracted, and the registration image is processed by utilizing an enhanced model, and the method comprises the following steps:

and extracting the balance characteristic and the contrast characteristic of the registered image, and processing the registered image by using an enhanced model under the condition that the balance is smaller than a first threshold or the contrast is smaller than a second threshold.

According to the method of image recognition of an embodiment of the invention, the registration model comprises a plurality of B-spline basis functions;

adopting a registration model to register the repaired image and the form template to obtain a registration image, wherein the registration image comprises:

extracting interesting characteristic points of the restored image;

and processing based on the interesting feature points, the table template and the B spline basis function to obtain the registration image.

According to the image recognition method of one embodiment of the invention, the semantic segmentation is carried out on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images, and the method comprises the following steps:

under the condition that the image to be segmented comprises a table, determining the position and arrangement relation of each cell in the table, and segmenting the cells;

performing semantic segmentation on each cell through a semantic segmentation model to obtain a plurality of character region images;

and carrying out contour recognition on the character area images to obtain a plurality of segmentation character images in each character area image.

According to the method of image recognition of an embodiment of the present invention, before the recognition of the segmented character image based on the character recognition model, the method further includes:

and training the character recognition model based on a pre-stored character data set to obtain the trained character recognition model.

According to the method for image recognition of one embodiment of the invention, before the document image is preprocessed, the method further comprises the following steps:

obtaining an initial document image through scanning;

extracting texture features of the initial document image, and performing defect detection through the texture features of the initial document image;

and carrying out contour extraction on the defective initial document image to obtain the target document image.

The embodiment of the invention also provides an image recognition device, which comprises:

the preprocessing module is used for preprocessing the target document image to obtain a preprocessed image to be segmented;

the semantic segmentation module is used for performing semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images;

the character recognition module is used for recognizing the segmented character image based on a character recognition model to obtain a corresponding recognition character;

and the document generation module is used for obtaining a document identification result based on the position of each identification character in the document image.

An embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement any of the steps of the image recognition method described above.

Embodiments of the present invention also provide a non-transitory computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the method for image recognition as described in any one of the above.

The image recognition method and the image recognition device provided by the embodiment of the invention have the advantages that the pre-processing is carried out on the target document image to obtain the pre-processed image to be segmented, the contrast between characters and the image is enhanced, and then the semantic segmentation is carried out on the image to be segmented through the semantic segmentation model to obtain a plurality of segmented character images; identifying the segmented character image based on the character identification model to obtain a corresponding identification character; the document recognition result is obtained based on the position of each recognition character in the document image, so that the characters in the target document image can be automatically recognized, and the problems of large workload, low efficiency and high error rate of manual entry are solved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a method for image recognition according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of another image recognition method provided by the embodiment of the invention;

FIG. 3 is a schematic diagram of a filter provided by an embodiment of the invention;

FIG. 4 is a schematic diagram of a model structure for image recognition according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an image recognition apparatus according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terminology used in the one or more embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the invention. As used in one or more embodiments of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present invention refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It should be understood that, although the terms first, second, etc. may be used herein to describe various information in one or more embodiments of the present invention, such information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first aspect may be termed a second aspect, and, similarly, a second aspect may be termed a first aspect, without departing from the scope of one or more embodiments of the present invention. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

In the embodiments of the present invention, a method and an apparatus for image recognition, an electronic device, and a non-transitory computer-readable storage medium are provided, which are described in detail in the following embodiments one by one.

First, the noun terms to which the methods of the embodiments of the invention relate are explained.

Classification and Regression Tree (CART) model: decision trees are partitions that represent classes using a structure similar to a tree, the construction of the tree can be regarded as a process of variable (attribute) selection, internal nodes represent the process of selecting those several variables (attributes) as partitions by the tree, leaf nodes of each tree represent the labels of a class, and the top level of the tree is the root node. When the dependent variable of the data set is a continuity value, the tree algorithm is a regression tree, and the mean value observed by leaf nodes can be used as a predicted value; when the dependent variable of the data set is a discrete numerical value, the tree algorithm is a classification tree, and the classification problem can be well solved. It should be noted that the algorithm is a binary tree, i.e. each non-leaf node can only extend two branches, so that when a non-leaf node is a multi-level (more than 2) discrete variable, the variable can be used multiple times.

Free-Form Deformation model (FFD): the FFD model algorithm mainly comprises two steps: embedding the object model in a frame; when the control point position changes, the frame will "pull" the model, thereby effecting deformation. Specifically, the steps include: 1) a local coordinate system STU is constructed, and then local coordinates (s, t, u) corresponding to each vertex coordinate of the model are calculated. The local coordinates (s, t, u) are fixed and invariant regardless of the change of the world coordinates of the control points; 2) and moving the control point, and recalculating the world coordinate of each vertex of the model by using the local coordinates (s, t, u) of the vertex of the model, the world coordinate of the control point and the Bernstein polynomial.

Gabor filter: the basic idea of the Gabor transform is to divide the signal into many small time intervals, each time interval being analyzed by fourier transform in order to determine the frequency at which the signal is present in that time interval. The processing method is to add a sliding window to f (t) and then perform Fourier transform. The two-dimensional Gabor filter formed by using the Gabor function has a characteristic of obtaining optimal localization in both the spatial domain and the frequency domain, and thus can well describe local structural information corresponding to spatial frequency (scale), spatial position, and directional selectivity.

The embodiment of the invention discloses an image identification method, which is shown in figure 1 and comprises the following steps of 101-104:

101. and preprocessing the target document image to obtain a preprocessed image to be segmented.

The target document image may be obtained in various ways, for example, by taking a picture or scanning a paper document, so as to generate a corresponding document image.

After the document images are obtained, not all the document images need to be identified by the method, and the subsequent image segmentation step is directly executed for the documents with clear handwriting and easy recognition; for the document with illegible handwriting, non-standard writing and blurred handwriting caused by long time of the year, the step of preprocessing is required to obtain the image to be segmented and then executing the subsequent segmentation.

102. And performing semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images.

In this embodiment, the semantic segmentation model may be a Convolutional Neural Networks (CNN) model or the like.

In this embodiment, each region is semantically segmented by the CNN model to obtain a binarized region image only including a character (number, chinese character, letter, and punctuation mark) region, interference of background and noise is eliminated, and an accurate character region position and a plurality of segmented character images are obtained by contour search.

It should be noted that what is obtained by the semantic segmentation model is an image containing characters, not the characters themselves. And further, after eliminating interference and searching outlines, obtaining the segmentation characters. For example, a character "2020" to be segmented, a region image containing the "2020" is obtained by a semantic segmentation model, and the region image may be a square image. Then, background and noise interference elimination and outline search are carried out on the area image, and character images of '2', '0', '2' and '0' are obtained respectively.

103. And identifying the segmented character image based on a character identification model to obtain a corresponding identification character.

In this embodiment, a character recognition model needs to be constructed in combination with a corresponding data set.

For example, for the recognition of the image of the railway engineering equipment overhaul record book, a railway-specific handwritten character data set needs to be formed by combining a railway common dictionary to train a character recognition model.

For another example, for image recognition of a financial data book, a financial-specific handwritten character data set needs to be formed in combination with a financial common dictionary to train a character recognition model.

104. Based on the positions of the respective recognition characters in the document image, a document recognition result is obtained.

Specifically, after each recognition character is determined, each recognition character is placed at a position in the corresponding document image, so that a document recognition result can be obtained.

Taking a table document as an example, through step 104, after characters can be recognized, the recognized characters are placed at corresponding cells based on the cells corresponding to each recognized character, and finally, a document recognition result is obtained.

The image recognition method provided by the embodiment of the invention comprises the steps of preprocessing a target document image to obtain a preprocessed image to be segmented, enhancing the contrast between characters and the image, and then performing semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images; identifying the segmented character image based on the character identification model to obtain a corresponding identification character; the document recognition result is obtained based on the position of each recognition character in the document image, so that the characters in the target document image can be automatically recognized, and the problems of large workload, low efficiency and high error rate of manual entry are solved.

The embodiment of the invention discloses an image recognition method, which is used for schematically explaining the method of the embodiment, and referring to fig. 2, and comprises the following steps:

201. an initial document image is obtained by scanning.

In this embodiment, the record book is manually checked by batch scanning, and an unstructured initial document image is generated.

202. And extracting the texture features of the initial document image, and detecting defects through the texture features of the initial document image.

The method comprises the steps of detecting defects such as missing prints, scratches and ink spots of a document image by using image texture characteristics, and extracting a contour by using a local dynamic threshold segmentation method to obtain a defect image.

203. And carrying out contour extraction on the defective initial document image to obtain the target document image.

After the target document image is obtained, because not all the target document images need to be preprocessed, preprocessing is not needed for the document image with clear handwriting and convenient recognition, and the subsequent segmentation step is directly carried out; preprocessing is needed for document images with blurred handwriting and difficult recognition.

The following steps 204-206 are three steps of the pre-treatment respectively: classification, registration and enhancement to achieve pre-processing of the target document image.

204. And extracting image texture characteristics of the target document image, classifying the defects of the target document image through a classification model, and repairing the defects of the target document image according to the classification.

In this step, the image texture features of the target document image are extracted through a Gabor filter.

Generating filter templates with different scales and different directions by the following formulas (1) to (3):

wherein x '═ x cos θ + y sin θ, y' ═ x sin θ + y cos θ; λ is the wavelength of a sine function, which can be understood as a scale; theta is the Gabor kernel function direction; psi refers to phase offset; σ refers to the standard deviation of the gaussian function. By changing λ and θ, a total of 15 filters in five directions at three angles are obtained, as shown in fig. 3.

In order to extract the texture features of the image, the embodiment convolves each filter in the filter bank with the target document image respectively to obtain the response value map corresponding to the filter. And calculating the filter serial number of the maximum response value obtained at each pixel point. And then establishing a normalized histogram in the local image block as an image texture feature of the image block.

In this embodiment, a Gabor filter is used to enhance a large amount of handwritten characters in an image, improve images in a highlight area and a dark area, and enhance the contrast between the characters and a paper background.

In this step, the classification model is schematically described as a CART model. The CART model is a binary tree model, different defects are classified and judged through the CART model, and repairing is carried out in the subsequent steps according to classification.

205. And registering the repaired image and the form template by adopting a registration model to obtain a registration image.

Aiming at the problems of paper deflection, dislocation and the like possibly generated in the scanning process, different scanning images are rotated, translated and scaled to the same scale and position, so that the accuracy of the subsequent identification step is enhanced.

In this step, the registration model is an FFD model as an example.

Specifically, the FFD model includes a plurality of B-spline basis functions, and step 205 includes: extracting interesting characteristic points of the restored image; and processing based on the interested feature points, the table template and the B spline basis function to obtain a registration image.

The point feature is a common feature contained in a scanned image, and an interested feature point is searched by adopting a region similarity-based strategy. Let p be_iIs a feature point to be extracted, w, in the reference image R_rIs p is_iCentered m × n window, w_fIs the window of the corresponding position u x v in the target image, when w_fAnd w_rWhen taking the maximum value under some similarity measure function, the window w_fCenter q of (1)_iIs p_iThe corresponding feature point of interest.

And then adopting a non-rigid registration algorithm based on an FFD model by using the extracted feature points of interest. Let Ω { (X, Y) |0 ≦ X < X, 0 ≦ Y < Y } denote the image to be registered, Φ denotes n overlaid on Ω_x×n_yA grid of uniform control points, wherein the (i, j) th control point is marked as phi_i，j，δ_x，δ_yRepresenting the grid spacing in the X-axis and Y-axis directions, respectively, the FFD model is given by the form of the tensor product of the one-dimensional 3-degree B-spline basis function, defined as the following equation (4):

wherein the content of the first and second substances,

the rounding operation is indicated, and the B spline basis functions are respectively as follows:

0≤u＜1。

let phi₀，Φ₁，Φ₂，...Φ_KRepresenting the K +1 th layer control grid, assuming that the control fixed point is incremented once from the K to the K +1 th layer, the FFD model will be written as a combination of multi-layer sub-models in the form of equation (5) below:

then, obtaining the deformation function T of each layer by solving the control vertex of each layer of grids_loc(x, y), this process is repeated for all levels and the final deformation function is obtained by resampling. And finally, the mutual information is used as a similarity measurement function, and the registration of the document image can be well scanned by using a gradient optimization strategy.

206. And extracting the balance characteristic and the contrast characteristic of the registered image, and processing the registered image by using an enhanced model to obtain the image to be segmented.

Specifically, step 206 includes: and extracting the balance characteristic and the contrast characteristic of the registered image, and executing the step of processing the registered image by using the enhanced model under the condition that the balance is less than a first threshold or the contrast is less than a second threshold so as to improve the visual effect of the image.

The equalization degree characteristic is a gray scale statistical characteristic and is represented by a histogram. The gray histogram is a function of gray level distribution, and is a statistic of gray level distribution in an image. The gray histogram is to count the occurrence frequency of all pixels in the digital image according to the size of the gray value. The gray histogram is a function of gray level, which represents the number of pixels in an image having a certain gray level, reflecting the frequency of occurrence of a certain gray level in the image. The degree of uniformity of the distribution of the gray histogram can reflect important statistical information of the image, and the characteristic is defined by equation (6):

wherein, F_hisThe degree of uniformity of the distribution of the gray level histogram;

m is the pixel bit width, which is usually 11 in medical images;

n is the number of pixels in the super pixel block;

for channels in a super-pixel block, c is the pixel value and v is the number of pixels.

For the contrast feature, a weber contrast feature is employed.

Weber's law (sensory threshold's law) states that, under the same stimulus, the dynamic range of the stimulus that a person can feel is proportional to the intensity of the standard stimulus, applied to the visual stimulus of a person, weber's contrast being defined as:

wherein I is the brightness of the object, I_bIs the overall brightness of the background.

In practical applications, it is often difficult to specify the object and the background, so this embodiment writes this feature as the following formula (7):

wherein, F_webIs characterized by the weber contrast ratio,

is the mean value of the gray levels of the super-pixel blocks, I^c(p) for each pixel value in a super-pixel block; n is the number of pixels in the current window; p is the pixel position, corresponding to the combination of x and y; b refers to the current window (image block), meaning when the pixel is not known bitWhen within the current window.

207. And performing semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images.

Specifically, step 207 includes the following steps S271 to S273:

s271, under the condition that the image to be segmented comprises a table, determining the position and the arrangement relation of each cell in the table, and segmenting the cells.

And S272, performing semantic segmentation on each cell through a semantic segmentation model to obtain a plurality of character region images.

In this embodiment, the character area image may be a cell image including the character.

And S273, performing contour recognition on the character area images to obtain a plurality of segmentation character images in each character area image.

Note that the divided character image is not the same as the character area image, and the character area image includes a plurality of divided character images. Taking the character "1995" as an example, the character area image is a cell area image including the character "1995", and the divided character images are images corresponding to four characters such as "1", "9", and "5", respectively.

208. And identifying the segmented character image based on a character identification model to obtain a corresponding identification character.

In this embodiment, the character recognition model may be a CNN model.

Before the segmented character image is recognized based on the character recognition model, the character recognition model is trained based on a pre-stored character data set to obtain a trained character recognition model.

209. Based on the positions of the respective recognition characters in the document image, a document recognition result is obtained.

Taking the original target document image as a form image as an example, the finally obtained document identification result is that the corresponding cells of the form are replaced by the image after the characters are identified.

The embodiment of the invention is schematically described by taking a railway service equipment overhaul record book as an example, and refer to fig. 4. In the embodiment of fig. 4, an end-to-end model is constructed, comprising: the system comprises a CART model, an FFD model, a Gabor filter model, a semantic segmentation model and a character recognition model.

The method comprises the following steps:

1) and scanning the railway engineering equipment overhaul record book to obtain an initial document image.

2) And extracting the texture features of the initial document image, and detecting defects through the texture features of the initial document image.

If no defect is present, the subsequent step 7) is performed.

3) And carrying out contour extraction on the defective initial document image to obtain a target document image.

4) And extracting image texture characteristics of the railway engineering equipment overhaul record book image through a Gabor filter, classifying defects of the railway engineering equipment overhaul record book image through a CART classification model, and repairing the defects of the target document image according to the classification.

The railway engineering equipment overhaul record book comprises a plurality of tables. Various categories such as missing prints, scratches, patches, etc. may be classified according to defects.

The images of the railway engineering equipment overhaul record book are enhanced through the Gabor filter model, so that the images of a highlight area and a dark area can be improved, and the contrast between characters and a paper background is enhanced.

5) And registering the repaired image and the form template by adopting the FFD model to obtain a registered image.

Wherein the purpose of the registration is: aiming at the problems of paper deflection, dislocation and the like possibly generated in the scanning process, different scanning images are rotated, translated and scaled to the same scale and position, so that the accuracy of the subsequent identification step is enhanced.

6) And extracting the balance and contrast characteristics of the registered image, and processing the registered image by using an enhanced model to obtain the image to be segmented.

7) And performing semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images.

8) And identifying the segmented character image based on a character identification model to obtain a corresponding identification character.

And forming a special railway handwritten character data set based on a prestored railway common dictionary, and training the character recognition model to obtain a trained character recognition model.

9) Based on the positions of the respective recognition characters in the document image, a document recognition result is obtained, as shown in fig. 4.

The following describes an apparatus for image recognition provided by an embodiment of the present invention, and the apparatus for image recognition described below and the method for image recognition described above may be referred to correspondingly.

The embodiment of the invention discloses an image recognition device, as shown in fig. 5, comprising:

the preprocessing module 501 is configured to preprocess the target document image to obtain a preprocessed image to be segmented;

a semantic segmentation module 502, configured to perform semantic segmentation on the image to be segmented through a semantic segmentation model to obtain a plurality of segmented character images;

a character recognition module 503, configured to recognize the segmented character image based on a character recognition model to obtain a corresponding recognition character;

and a document generating module 504, configured to obtain a document identification result based on a position of each identification character in the document image.

Optionally, the preprocessing module 501 includes:

the repairing unit is used for extracting image texture characteristics of the target document image, classifying defects of the target document image through the classification model and repairing the defects of the target document image according to the classification;

the registration unit is used for registering the repaired image and the form template by adopting a registration model to obtain a registration image;

and the enhancement unit is used for extracting the balance and contrast characteristics of the registered image and processing the registered image by using an enhancement model to obtain the image to be segmented.

Optionally, the enhancing unit is specifically configured to: and extracting the balance degree characteristic and the contrast degree characteristic of the registered image, and processing the registered image by using an enhanced model under the condition that the balance degree is smaller than a first threshold value or the contrast degree is smaller than a second threshold value to obtain the image to be segmented.

Optionally, the registration model comprises a plurality of B-spline basis functions;

the registration unit is specifically configured to: extracting interesting characteristic points of the restored image; and processing based on the interesting feature points, the table template and the B spline basis function to obtain the registration image.

Optionally, the semantic segmentation module 502 is specifically configured to:

Optionally, the apparatus further comprises: and the training module is used for training the character recognition model based on a pre-stored character data set to obtain a trained character recognition model.

Optionally, the apparatus further comprises: the defect detection module is used for obtaining an initial document image through scanning; extracting texture features of the initial document image, and performing defect detection through the texture features of the initial document image; and carrying out contour extraction on the defective initial document image to obtain the target document image.

The image recognition device provided by the embodiment of the invention obtains the preprocessed image to be segmented by preprocessing the target document image, enhances the contrast between characters and the image, and then carries out semantic segmentation on the image to be segmented by a semantic segmentation model to obtain a plurality of segmented character images; identifying the segmented character image based on the character identification model to obtain a corresponding identification character; the document recognition result is obtained based on the position of each recognition character in the document image, so that the characters in the target document image can be automatically recognized, and the problems of large workload, low efficiency and high error rate of manual entry are solved.

Fig. 6 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 6: a processor (processor)610, a communication Interface (Communications Interface)620, a memory (memory)630 and a communication bus 640, wherein the processor 610, the communication Interface 620 and the memory 630 communicate with each other via the communication bus 640. The processor 610 may invoke logic instructions in the memory 630 to perform a method of image recognition, comprising:

In addition, the logic instructions in the memory 630 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, an embodiment of the present invention further provides a computer program product, where the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, and when the program instructions are executed by a computer, the computer can execute the method for image recognition provided by the above-mentioned method embodiments, including:

In still another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented by a processor to perform the method for image recognition provided by the foregoing embodiments, and the method includes:

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method of image recognition, comprising:

2. The image recognition method of claim 1, wherein preprocessing the target document image to obtain a preprocessed image to be segmented comprises:

enhancing: and extracting the balance characteristic and the contrast characteristic of the registered image, and processing the registered image by using an enhanced model to obtain the image to be segmented.

3. The method of image recognition according to claim 2, wherein extracting the balance feature and the contrast feature of the registered image and processing the registered image with the enhanced model comprises:

4. The method of image recognition according to claim 2, wherein the registration model comprises a plurality of B-spline basis functions;

extracting interesting characteristic points of the restored image;

5. The image recognition method of claim 1, wherein performing semantic segmentation on the image to be segmented by a semantic segmentation model to obtain a plurality of segmented character images comprises:

6. The method of image recognition according to claim 1, wherein prior to recognizing the segmented character image based on a character recognition model, the method further comprises:

7. The method of image recognition according to claim 1, wherein prior to pre-processing the document image, the method further comprises:

obtaining an initial document image through scanning;

8. An apparatus for image recognition, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of image recognition according to any of claims 1 to 7 are implemented when the program is executed by the processor.

10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of image recognition according to any one of claims 1 to 7.