CN113920520A - Image text recognition method, system, storage medium and electronic equipment - Google Patents
Image text recognition method, system, storage medium and electronic equipment Download PDFInfo
- Publication number
- CN113920520A CN113920520A CN202111076489.7A CN202111076489A CN113920520A CN 113920520 A CN113920520 A CN 113920520A CN 202111076489 A CN202111076489 A CN 202111076489A CN 113920520 A CN113920520 A CN 113920520A
- Authority
- CN
- China
- Prior art keywords
- image
- character
- font
- identification
- current
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000003860 storage Methods 0.000 title claims abstract description 9
- 238000012545 processing Methods 0.000 claims abstract description 35
- 238000012937 correction Methods 0.000 claims abstract description 18
- 238000005070 sampling Methods 0.000 claims description 15
- 238000004590 computer program Methods 0.000 claims description 10
- 238000005520 cutting process Methods 0.000 claims description 5
- 238000012790 confirmation Methods 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 description 7
- 230000005540 biological transmission Effects 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 239000003086 colorant Substances 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000007639 printing Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Character Discrimination (AREA)
Abstract
The invention relates to the technical field of image recognition, in particular to an image text recognition method, a system, a storage medium and an electronic device, wherein the image text recognition method comprises the following steps: acquiring an image to be identified; carrying out correction processing on an image to be recognized to obtain a corrected image; carrying out font identification on the character information in the corrected image to obtain an identification result, and determining a character library corresponding to the font adopted by the current character information according to the identification result; and continuously identifying the character information in the corrected image according to the character library corresponding to the current font to obtain a character identification document. The character recognition method and the character recognition device have the advantages that the character is confirmed, the character information to be recognized is compared with the characters in the character library, namely the characters in the current character information are compared with the characters in the character library according to the character library corresponding to the current character, so that the characters in the current corrected image are recognized one by one, the recognition accuracy is high, and meanwhile, the multithreading processing can be adopted, so that the character recognition efficiency is effectively improved.
Description
Technical Field
The invention relates to the technical field of image recognition, in particular to an image text recognition method, an image text recognition system, a storage medium and electronic equipment.
Background
Image recognition, which refers to a technique for processing, analyzing and understanding images by a computer to recognize various different patterns of objects and objects, is a practical application of applying a deep learning algorithm. Image recognition technology at present is generally divided into face recognition and commodity recognition, and the face recognition is mainly applied to security inspection, identity verification and mobile payment; the commodity identification is mainly applied to the commodity circulation process, in particular to the field of unmanned retail such as unmanned goods shelves and intelligent retail cabinets. The traditional image identification process is divided into four steps: image acquisition → image preprocessing → feature extraction → image recognition.
As image recognition technology has been gradually developed, image recognition technology is also applied to recognizing text present in an image. For example, in many current communication software and translation software, the characters in the image are often recognized by using image recognition technology, which is to take a picture, correct the image, and recognize the characters on the current image through feature extraction.
The image recognition technology in the prior art is easy to have the problem of inaccurate recognition when applied to image character recognition, and particularly, the recognition accuracy of characters with very few strokes or very many strokes is not high, so that when the prior art is used for continuous character recognition, a large number of errors are easy to occur in a document obtained by recognition, the errors need to be manually modified, and the overall time is almost the same as that of pure manual recognition.
Disclosure of Invention
The invention provides an image text recognition method, an image text recognition system, a storage medium and electronic equipment, overcomes the defects of the prior art, and can effectively solve the problems of low recognition accuracy and easy error in continuous character recognition in the conventional image recognition mode.
One of the technical schemes of the invention is realized by the following measures: an image text recognition method comprising:
acquiring an image to be identified, wherein the image to be identified comprises character information;
carrying out correction processing on an image to be recognized to obtain a corrected image, wherein the character information in the corrected image is horizontally arranged;
carrying out font identification on the character information in the corrected image to obtain an identification result, and determining a character library corresponding to the font adopted by the current character information according to the identification result;
and continuously identifying the character information in the corrected image according to the character library corresponding to the current font to obtain a character identification document.
The following is further optimization or/and improvement of the technical scheme of the invention:
the above-mentioned image to be identified is corrected and is handled, obtains and corrects the image, includes:
carrying out contrast adjustment on an image to be identified to obtain a high-contrast image;
carrying out pixel identification on the high-contrast image to obtain a pixel identification result;
cutting the high-contrast image into a character image and a non-character image according to the pixel identification result, wherein the character image is rectangular, and the non-character image is a part of the high-contrast image without the character image;
and continuously splicing all the character images to obtain horizontally arranged corrected images.
The above font recognition of the character information in the corrected image includes:
intercepting a sampling image in the corrected image, wherein the sampling image only comprises one character;
carrying out gray level processing on the sampled image, and drawing the edges of characters contained in the sampled image to obtain an edge-drawing character image;
identifying and obtaining font characteristics according to the stroked character images, wherein the font characteristics comprise outline of strokes;
and determining the font adopted in the current text information according to the font characteristics.
The step of continuously identifying the character information in the corrected image according to the character library corresponding to the current font to obtain the character identification document comprises the following steps:
continuously intercepting each character image of the corrected image;
comparing each character image with each character in the corresponding font library to obtain a plurality of identification character codes, wherein each character code is used for representing one character;
and generating a character recognition document according to the character codes.
Arranging the characters in the font library according to the use frequency of the Chinese characters; or/and in the step of comparing each character image with each character in the corresponding font library, if the current character image cannot find the character matched with the current character image from the font library, generating a specific character code for representing the identification error.
One of the technical schemes of the invention is realized by the following measures: an image text recognition system comprising:
the image acquisition unit is used for acquiring an image to be identified, wherein the image to be identified comprises character information;
the correction processing unit is used for correcting the image to be recognized to obtain a corrected image, wherein the character information in the corrected image is horizontally arranged;
the font identification unit is used for carrying out font identification on the character information in the corrected image to obtain an identification result and determining a character library corresponding to the font adopted by the current character information according to the identification result;
and the continuous identification unit is used for continuously identifying the character information in the corrected image according to the character library corresponding to the current font to obtain a character identification document.
The following is further optimization or/and improvement of the technical scheme of the invention:
the correction processing unit includes:
the contrast adjusting module is used for adjusting the contrast of the image to be identified to obtain a high-contrast image;
the pixel identification module is used for carrying out pixel identification on the high-contrast image to obtain a pixel identification result;
the image cutting module cuts the high-contrast image into a character image and a non-character image according to the pixel identification result, wherein the character image is rectangular, and the part of the high-contrast image, from which the character image is removed, is the non-character image;
and the image splicing module is used for continuously splicing all character images to obtain horizontally arranged correction images.
The font recognition unit includes:
the sample intercepting module intercepts a sampling image in the corrected image, and the sampling image only comprises one character;
the gray processing module is used for carrying out gray processing on the sampled image and tracing the edges of characters contained in the sampled image to obtain a traced character image;
the character recognition module is used for recognizing and obtaining font characteristics according to the stroked character images, wherein the font characteristics comprise outline of strokes;
and the font confirmation module is used for confirming the font adopted in the current text information according to the font characteristics.
The character recognition method and the character recognition device have the advantages that the character is confirmed, the character information to be recognized is compared with the characters in the character library, namely the characters in the current character information are compared with the characters in the character library according to the character library corresponding to the current character, so that the characters in the current corrected image are recognized one by one, the recognition accuracy is high, and meanwhile, the multithreading processing can be adopted, so that the character recognition efficiency is effectively improved.
Drawings
Fig. 1 is a main flow chart of an image text recognition method according to an embodiment of the present invention.
Fig. 2 is a flowchart illustrating a correction process performed on an image to be recognized according to an embodiment of the present invention.
Fig. 3 is a flowchart illustrating font recognition of text information in a rectified image according to an embodiment of the present invention.
Fig. 4 is a flowchart illustrating a process of continuously recognizing text information in a corrected image according to a text library corresponding to a current font according to an embodiment of the present invention.
Fig. 5 is a structural diagram of an image text recognition system according to an embodiment of the present invention.
Detailed Description
The present invention is not limited by the following examples, and specific embodiments may be determined according to the technical solutions and practical situations of the present invention.
The invention is further described with reference to the following examples and figures:
embodiment 1, as shown in fig. 1, an embodiment of the present invention discloses an image text recognition method, including:
s100, acquiring an image to be identified, wherein the image to be identified comprises character information;
in the step, firstly, an image to be recognized is obtained, the image to be recognized contains text information, the text information is clear and has high resolution, and the image to be recognized is shot by a camera and other tools, so that the shot picture has proper contrast and exposure so as to be recognized conveniently;
s200, carrying out correction processing on the image to be recognized to obtain a corrected image;
in this step, for the images to be recognized, due to the difference of the shooting angles or shooting technologies, the images to be recognized are different, and the problem that the text information in the images to be recognized is not parallel to the edge of the images to be recognized easily occurs, at this time, the images to be recognized need to be corrected, the correction process includes adjusting the text information in the images to be recognized, the text information is made to be in a horizontal state as a whole in a rotating and deforming mode, and the contents irrelevant to the text in the corrected images obtained after the correction process are removed, so that the success rate of recognition is improved;
s300, performing font identification on the character information in the corrected image to obtain an identification result, and determining a character library corresponding to the font adopted by the current character information according to the identification result;
in the step, during recognition, firstly, the font used by the characters in the current character information needs to be determined, and the styles of strokes of the characters in the current character information are different for different fonts, so that the character information of the characters in the current character information is obtained by splitting the characters in the current character information, the font used by the character information in the current corrected image is determined according to the stroke information, and the character recognition can be rapidly performed after the font is determined, so that the character recognition efficiency is greatly improved;
s400, continuously identifying the character information in the corrected image according to the character library corresponding to the current font to obtain a character identification document, and storing the character identification document;
after the step S300, the font corresponding to the current text information is determined, so that in this step, the text in the current text information may be compared with the font library corresponding to the current font, so as to identify the text in the current corrected image one by one, and multiple threads may be used to perform processing as needed, so as to improve the text identification efficiency, and after the identification is completed, a corresponding text identification document is generated and stored.
Embodiment 2, as shown in fig. 2, as a preferred embodiment of the present invention, the performing a rectification process on an image to be recognized to obtain a rectified image includes:
s201, carrying out contrast adjustment on an image to be identified to obtain a high-contrast image;
in the step, after the image to be recognized is obtained, firstly, the image needs to be processed, specifically, the contrast of the image needs to be adjusted, because the light distribution is uneven in the shooting process, shadows will appear on the image, and after the contrast is adjusted, characters on the image are clearer, so that subsequent recognition is facilitated;
s202, carrying out pixel identification on the high-contrast image to obtain a pixel identification result;
in this step, after the high contrast adjustment is performed, the color of the region where the text information is located in the high contrast image is different from the color of the background, so that the region having the text information in the current high contrast image and the background region can be determined by pixel identification;
s203, cutting the high-contrast image into a character image and a non-character image according to the pixel identification result;
in this step, generally, for a printed document, the color difference between a general background and characters is large, and for characters and backgrounds with relatively small color difference, the color difference between the characters and the background can be improved in a manner of performing gray processing and picture adjustment so as to facilitate identification, so that a high-contrast image can be cut into a character image and a non-character image according to a pixel identification result;
s204, continuously splicing all character images to obtain horizontally arranged corrected images;
after the processing of step S203, a plurality of text images are obtained, in this step, the text images cover all text information in the current area, in order to improve the recognition rate, the entire range of the text images should include all text information, and a blank area should be reserved around the text information to ensure that the screenshot of the text can be extracted through the contrast in the subsequent recognition process; in the current printing process, characters are printed line by line, in order to enable the characters to be continuously identified, a plurality of sections of character images are spliced to finally obtain a strip-shaped corrected image, and when identification is carried out, identification is carried out from one end to the other end.
Embodiment 3, as shown in fig. 3, as a preferred embodiment of the present invention, the performing font identification on text information in a rectified image includes:
s301, intercepting a sampling image in the corrected image, wherein the sampling image only comprises one character;
in this step, after the correction processing is performed on the corrected image, the whole corrected image is a long strip-shaped picture, so when font identification is performed, firstly, a character in the corrected image is arbitrarily intercepted, or a first character in the corrected image can be directly intercepted, and it is required to ensure that only one character is included in the intercepted sample image;
s302, carrying out gray processing on the sampled image, and drawing out the edges of characters contained in the sampled image to obtain an edge-drawing character image;
in the step, because a blank area is reserved around the character information in the corrected image, after the sampling image is intercepted, the character information is displayed in the middle position of the sampling image, then the gray level processing is carried out on the sampling image, so that the color in the sampling image is removed, the character is clearer, and then the character in the sampling image is stroked, so that a character frame is obtained;
s303, identifying according to the stroked character image to obtain font characteristics;
in the step, font characteristics are obtained according to character delineation, wherein the font characteristics are specific structural characteristics of strokes, such as the overall proportion and the width condition of the strokes "|";
s304, determining the font adopted in the current text information according to the font characteristics;
in this step, comparing the font characteristics with the font characteristics corresponding to the fonts in the font library, and when the font characteristics of the characters in the sampled image can correspond to the font characteristics in the font library, it indicates that the current character information adopts the fonts in the font library, so that the fonts adopted in the current character information are determined.
Embodiment 4, as shown in fig. 4, as a preferred embodiment of the present invention, the step of continuously recognizing the text information in the corrected image according to the text library corresponding to the current font to obtain a text recognition document specifically includes:
s401, continuously intercepting each character image of the corrected image;
in this step, each character needs to be separately divided during recognition, so as to recognize the characters in sequence, and therefore, in the continuous recognition process, the region where each character in the corrected image is located needs to be intercepted, so as to facilitate comparison;
s402, comparing each character image with each character in a corresponding font library to obtain a plurality of identification character codes, wherein each character code is used for representing one character;
in this step, when comparing, extracting each character image one by one, and after extracting, comparing each character image with each character in the font library in turn, when the current character image and the characters in the font library can be superposed after being enlarged and reduced, determining the specific content of the current character, because the number of the characters is limited, setting an independent character code for each character, for example, the character code of the Chinese character "one" is "W00001", "the character code of the second" is "W00002", and analogizing in turn, giving a unique character code to all the Chinese characters, therefore after identifying, generating a corresponding character code, which only represents the characters and does not contain font information;
s403, generating a character recognition document according to the character code;
in this step, after the recognition is completed, the corresponding character recognition document is generated according to the sequence of the character codes.
In this embodiment, when an unrecognizable character appears, it indicates that the character is blocked or printed not clearly, and therefore cannot be recognized, and an error code may be set for the character, so that the user can modify the character at a later stage; in addition, in order to improve the recognition efficiency, the characters in the font library should be arranged according to the size of the use frequency of the Chinese characters, so that the comparison times are reduced.
Embodiment 5, as shown in fig. 5, provides an image text recognition system according to an embodiment of the present invention, including:
the image acquiring unit 100 is configured to acquire an image to be recognized, where the image to be recognized includes text information.
In the system, firstly, an image to be recognized is acquired through the image acquisition unit 100, the image to be recognized contains text information, the text information should be clear and have high resolution, and the image to be recognized is photographed through a camera or other tools, so that the photographed picture should have proper contrast and exposure to facilitate recognition thereof.
And the correction processing unit 200 is configured to perform correction processing on the image to be recognized to obtain a corrected image, where the character information in the corrected image is horizontally arranged.
In the system, the image to be recognized is corrected by the correction processing unit 200, and due to the difference of the shooting angles or shooting technologies, the finally obtained image to be recognized is different, and the problem that the text information in the image to be recognized is not parallel to the edge of the image to be recognized easily occurs, at this time, the image to be recognized needs to be corrected, the correction processing process includes adjusting the text information in the image to be recognized, the text information is wholly in a horizontal state through rotation and deformation, and the content irrelevant to the text in the corrected image obtained after the correction processing is eliminated, so that the success rate of recognition is improved.
And a font identification unit 300, configured to perform font identification on the text information in the corrected image to obtain an identification result, and determine, according to the identification result, a text library corresponding to a font used by the current text information.
In the present system, when the font identification unit 300 identifies, it is first necessary to determine the font used by the text in the current text information, and for different fonts, the styles of the strokes are different, so that the text in the current text information is split to obtain the stroke information of the text in the current text information, and according to the stroke information, it is determined what font is used by the text information in the current corrected image, and after the font is determined, the text identification can be performed quickly, thereby greatly improving the text identification efficiency.
And the continuous identification unit 400 is configured to continuously identify the text information in the corrected image according to the text library corresponding to the current font, obtain a text identification document, and store the text identification document.
In the present system, the continuous recognition unit 400 compares the characters in the current character information with the current character information, so as to recognize the characters in the current corrected image one by one, and of course, multiple threads may be used for processing as required, so as to improve the character recognition efficiency, and after the recognition is completed, the corresponding character recognition document is generated and stored.
The corrective processing unit 200 includes:
and the contrast adjusting module 201 is configured to perform contrast adjustment on the image to be recognized to obtain a high-contrast image.
In the system, a contrast adjustment module 201 performs contrast adjustment on an image to be recognized; after obtaining the image to be recognized, firstly, the image needs to be processed, specifically, the contrast adjustment is performed on the image, because in the shooting process, light distribution is easy to be uneven, so that shadows will appear on the image, and after the contrast adjustment is performed, characters on the image are clearer, so that subsequent recognition is facilitated.
The pixel identification module 202 is configured to perform pixel identification on the high-contrast image to obtain a pixel identification result.
In the system, after the contrast adjustment module 201 performs the high contrast adjustment, the color of the region where the text information is located in the high contrast image is different from the color of the background, and the pixel identification module 202 can determine the region with the text information and the background region in the current high contrast image through pixel identification.
And the image cropping module 203 is used for cropping the high-contrast image into a character image and a non-character image according to the pixel identification result.
In the present system, the image cropping module 203 enhances the color difference between the text and the background by performing the gray-scale processing and the picture adjustment for the purpose of recognition, so that the high-contrast image can be cropped into a text image and a non-text image according to the pixel recognition result.
And the image splicing module 204 is used for continuously splicing all the character images to obtain horizontally arranged corrected images.
In the system, the image stitching module 204 stitches the plurality of text images to finally obtain a strip-shaped corrected image, and when recognition is performed, recognition is performed from one end to the other end.
The font recognition unit 300 includes:
the sample capture module 301 is configured to capture a sample image in the corrected image, where the sample image includes only one text.
In the system, the sample capture module 301 captures a text in the corrected image arbitrarily, or directly captures the first text in the corrected image, and it is required to ensure that the captured sample image contains only one text.
The grayscale processing module 302 is configured to perform grayscale processing on the sampled image and trace the edges of the characters included in the sampled image to obtain a traced character image.
In the system, the gray processing module 302 performs gray processing on the sampled image to remove colors in the sampled image, so that the text is clearer, and then the text in the sampled image is stroked to obtain a frame of the text.
The feature recognition module 303 is configured to obtain a font feature according to the identification of the stroked character image.
And the font confirming module 304 is configured to determine a font used in the current text information according to the font characteristics.
Embodiment 6 is a storage medium having stored thereon a computer program readable by a computer, the computer program being arranged to execute a method for identifying a weak link in a power grid based on extreme ice damage when the computer program is run.
The storage medium may include, but is not limited to: u disk, read-only memory, removable hard disk, magnetic or optical disk, etc. various media capable of storing computer programs.
The storage medium may also include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
Embodiment 7, the electronic device includes a processor and a memory, where the memory stores a computer program, and the computer program is loaded and executed by the processor to implement the method for identifying the weak link of the power grid based on the extreme ice disaster.
The electronic equipment further comprises transmission equipment and input and output equipment, wherein the transmission equipment and the input and output equipment are both connected with the processor.
The above technical features constitute the best embodiment of the present invention, which has strong adaptability and best implementation effect, and unnecessary technical features can be increased or decreased according to actual needs to meet the requirements of different situations.
Claims (10)
1. An image text recognition method, comprising:
acquiring an image to be identified, wherein the image to be identified comprises character information;
carrying out correction processing on an image to be recognized to obtain a corrected image, wherein the character information in the corrected image is horizontally arranged;
carrying out font identification on the character information in the corrected image to obtain an identification result, and determining a character library corresponding to the font adopted by the current character information according to the identification result;
and continuously identifying the character information in the corrected image according to the character library corresponding to the current font to obtain a character identification document.
2. The image text recognition method according to claim 1, wherein the correcting the image to be recognized to obtain a corrected image comprises:
carrying out contrast adjustment on an image to be identified to obtain a high-contrast image;
carrying out pixel identification on the high-contrast image to obtain a pixel identification result;
cutting the high-contrast image into a character image and a non-character image according to the pixel identification result, wherein the character image is rectangular, and the non-character image is a part of the high-contrast image without the character image;
and continuously splicing all the character images to obtain horizontally arranged corrected images.
3. The image text recognition method according to claim 1 or 2, wherein the performing font recognition on the text information in the rectified image includes:
intercepting a sampling image in the corrected image, wherein the sampling image only comprises one character;
carrying out gray level processing on the sampled image, and drawing the edges of characters contained in the sampled image to obtain an edge-drawing character image;
identifying and obtaining font characteristics according to the stroked character images, wherein the font characteristics comprise outline of strokes;
and determining the font adopted in the current text information according to the font characteristics.
4. The image text recognition method according to any one of claims 1 to 3, wherein the step of continuously recognizing the text information in the corrected image according to the text library corresponding to the current font to obtain the text recognition document comprises:
continuously intercepting each character image of the corrected image;
comparing each character image with each character in the corresponding font library to obtain a plurality of identification character codes, wherein each character code is used for representing one character;
and generating a character recognition document according to the character codes.
5. The image text recognition method of claim 4, wherein the arrangement order of the characters in the font library is arranged according to the frequency of use of the Chinese characters; or/and in the step of comparing each character image with each character in the corresponding font library, if the current character image cannot find the character matched with the current character image from the font library, generating a specific character code for representing the identification error.
6. An image text recognition system, comprising:
the image acquisition unit is used for acquiring an image to be identified, wherein the image to be identified comprises character information;
the correction processing unit is used for correcting the image to be recognized to obtain a corrected image, wherein the character information in the corrected image is horizontally arranged;
the font identification unit is used for carrying out font identification on the character information in the corrected image to obtain an identification result and determining a character library corresponding to the font adopted by the current character information according to the identification result;
and the continuous identification unit is used for continuously identifying the character information in the corrected image according to the character library corresponding to the current font to obtain a character identification document.
7. The image text recognition system of claim 8, wherein the rectification processing unit comprises:
the contrast adjusting module is used for adjusting the contrast of the image to be identified to obtain a high-contrast image;
the pixel identification module is used for carrying out pixel identification on the high-contrast image to obtain a pixel identification result;
the image cutting module cuts the high-contrast image into a character image and a non-character image according to the pixel identification result, wherein the character image is rectangular, and the part of the high-contrast image, from which the character image is removed, is the non-character image;
and the image splicing module is used for continuously splicing all character images to obtain horizontally arranged correction images.
8. The image text recognition system according to claim 8 or 9, wherein the font recognition unit includes:
the sample intercepting module intercepts a sampling image in the corrected image, and the sampling image only comprises one character;
the gray processing module is used for carrying out gray processing on the sampled image and tracing the edges of characters contained in the sampled image to obtain a traced character image;
the character recognition module is used for recognizing and obtaining font characteristics according to the stroked character images, wherein the font characteristics comprise outline of strokes;
and the font confirmation module is used for confirming the font adopted in the current text information according to the font characteristics.
9. A storage medium, on which a computer program readable by a computer is stored, the computer program being arranged to perform the image text recognition method according to any one of claims 1 to 5 when executed.
10. An electronic device, comprising a processor and a memory, wherein a computer program is stored in the memory, and wherein the computer program is loaded and executed by the processor to implement the image text recognition method according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111076489.7A CN113920520A (en) | 2021-09-14 | 2021-09-14 | Image text recognition method, system, storage medium and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111076489.7A CN113920520A (en) | 2021-09-14 | 2021-09-14 | Image text recognition method, system, storage medium and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113920520A true CN113920520A (en) | 2022-01-11 |
Family
ID=79234773
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111076489.7A Pending CN113920520A (en) | 2021-09-14 | 2021-09-14 | Image text recognition method, system, storage medium and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113920520A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114511862A (en) * | 2022-02-17 | 2022-05-17 | 北京百度网讯科技有限公司 | Form identification method and device and electronic equipment |
CN117037184A (en) * | 2023-10-10 | 2023-11-10 | 深圳牛图科技有限公司 | OCR fuzzy recognition system and method based on cloud matching |
-
2021
- 2021-09-14 CN CN202111076489.7A patent/CN113920520A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114511862A (en) * | 2022-02-17 | 2022-05-17 | 北京百度网讯科技有限公司 | Form identification method and device and electronic equipment |
CN114511862B (en) * | 2022-02-17 | 2023-11-10 | 北京百度网讯科技有限公司 | Form identification method and device and electronic equipment |
CN117037184A (en) * | 2023-10-10 | 2023-11-10 | 深圳牛图科技有限公司 | OCR fuzzy recognition system and method based on cloud matching |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109492643B (en) | Certificate identification method and device based on OCR, computer equipment and storage medium | |
CN110046529B (en) | Two-dimensional code identification method, device and equipment | |
CN112183038A (en) | Form identification and typing method, computer equipment and computer readable storage medium | |
CN109426814B (en) | Method, system and equipment for positioning and identifying specific plate of invoice picture | |
CN110781877B (en) | Image recognition method, device and storage medium | |
CN112966537B (en) | Form identification method and system based on two-dimensional code positioning | |
CN101908136A (en) | Table identifying and processing method and system | |
CN113920520A (en) | Image text recognition method, system, storage medium and electronic equipment | |
US11151402B2 (en) | Method of character recognition in written document | |
CN111033563A (en) | Image analysis method and system for immunochromatography detection | |
CN111046644A (en) | Answer sheet template generation method, identification method, device and storage medium | |
CN115984859B (en) | Image character recognition method, device and storage medium | |
CN105678301B (en) | method, system and device for automatically identifying and segmenting text image | |
CN112580499A (en) | Text recognition method, device, equipment and storage medium | |
CN109741273A (en) | A kind of mobile phone photograph low-quality images automatically process and methods of marking | |
CN112861861A (en) | Method and device for identifying nixie tube text and electronic equipment | |
EP2816504A1 (en) | Character-extraction method and character-recognition device and program using said method | |
CN111008635A (en) | OCR-based multi-bill automatic identification method and system | |
CN112836682B (en) | Method, device, computer equipment and storage medium for identifying object in video | |
RU2597163C2 (en) | Comparing documents using reliable source | |
CN109635798B (en) | Information extraction method and device | |
US9152876B1 (en) | Methods and systems for efficient handwritten character segmentation | |
Bhaskar et al. | Implementing optical character recognition on the android operating system for business cards | |
CN113537229B (en) | Bill image generation method, device, computer equipment and storage medium | |
CN111401365A (en) | OCR image automatic generation method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |