CN112052849A - Method, device and equipment for judging file image direction in OCR (optical character recognition) - Google Patents
Method, device and equipment for judging file image direction in OCR (optical character recognition) Download PDFInfo
- Publication number
- CN112052849A CN112052849A CN202010869885.4A CN202010869885A CN112052849A CN 112052849 A CN112052849 A CN 112052849A CN 202010869885 A CN202010869885 A CN 202010869885A CN 112052849 A CN112052849 A CN 112052849A
- Authority
- CN
- China
- Prior art keywords
- image
- character recognition
- text
- ocr
- recognition result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012015 optical character recognition Methods 0.000 title claims abstract description 56
- 238000000034 method Methods 0.000 title claims abstract description 53
- 230000011218 segmentation Effects 0.000 claims abstract description 67
- 238000003860 storage Methods 0.000 claims abstract description 16
- 230000015654 memory Effects 0.000 claims description 31
- 238000004422 calculation algorithm Methods 0.000 claims description 28
- 238000004590 computer program Methods 0.000 claims description 22
- 238000004140 cleaning Methods 0.000 claims description 7
- 102100032202 Cornulin Human genes 0.000 claims description 5
- 101000920981 Homo sapiens Cornulin Proteins 0.000 claims description 5
- 230000004807 localization Effects 0.000 claims 2
- 238000010586 diagram Methods 0.000 description 12
- 230000006870 function Effects 0.000 description 8
- 238000004891 communication Methods 0.000 description 7
- 238000013527 convolutional neural network Methods 0.000 description 7
- 230000007246 mechanism Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000006403 short-term memory Effects 0.000 description 3
- 238000005406 washing Methods 0.000 description 3
- 239000000463 material Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- ZZUFCTLCJUWOSV-UHFFFAOYSA-N furosemide Chemical compound C1=C(Cl)C(S(=O)(=O)N)=CC(C(O)=O)=C1NCC1=CC=CO1 ZZUFCTLCJUWOSV-UHFFFAOYSA-N 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000011514 reflex Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/146—Aligning or centring of the image pick-up or image-field
- G06V30/1475—Inclination or skew detection or correction of characters or of image to be recognised
- G06V30/1478—Inclination or skew detection or correction of characters or of image to be recognised of characters or characters lines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/22—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Character Input (AREA)
Abstract
The embodiment of the specification provides a method, a device, equipment and a storage medium for judging the direction of a file image in OCR (optical character recognition), wherein the method comprises the following steps: acquiring a target image; rotating the target image according to a plurality of specified rotation angles, and respectively obtaining a character recognition result of the target image at each specified rotation angle; performing word segmentation on the character recognition result to correspondingly obtain a plurality of word segmentation sets; and determining the participle set with the most word number in the participle sets, and taking the image direction corresponding to the participle set as the forward direction of the target image. The embodiment of the specification can improve the efficiency and the application range of judging the image direction of the file in OCR.
Description
Technical Field
The present disclosure relates to the field of Optical Character Recognition (OCR), and in particular, to a method, an apparatus, a device, and a storage medium for determining a direction of a document image in OCR.
Background
OCR recognition refers to the process of inputting an image and then obtaining text in the image. However, when performing OCR recognition on a document image (for example, a contract image or a certificate image), an input image is often not in a forward direction due to problems such as image scanning or a photographing method, and therefore, it is necessary to determine the document image direction in the OCR in order to perform a rectification process.
Currently, methods for determining the direction of a document image in OCR mainly include a manual method and an automatic method. The manual method is to manually identify the forward direction of the document image, perform alignment according to the forward direction, and perform OCR (optical character recognition) on the basis. Obviously, this approach is time consuming, labor intensive, and costly. The conventional automatic method is generally based on template matching, that is, a template is given in advance, OCR recognition is performed on a document image at each angle, the recognition result is matched with keywords in the template, and the angle direction with the largest number of matching times is taken as the forward direction of the document image. However, this template matching method can only be applied to document images within the range of templates, but is ineffective for document images of a new format, and cannot meet the general requirements.
Disclosure of Invention
An object of the embodiments of the present specification is to provide a method, an apparatus, a device, and a storage medium for determining a document image direction in OCR, so as to improve efficiency and an application range for determining the document image direction in OCR.
In order to achieve the above object, in one aspect, an embodiment of the present specification provides a method for determining a document image direction in OCR, including:
acquiring a target image;
rotating the target image according to a plurality of specified rotation angles, and respectively obtaining a character recognition result of the target image at each specified rotation angle;
performing word segmentation on the character recognition result to correspondingly obtain a plurality of word segmentation sets;
and determining the participle set with the most word number in the participle sets, and taking the image direction corresponding to the participle set as the forward direction of the target image.
In an embodiment of this specification, the acquiring a text recognition result of the target image at each specified rotation angle includes:
for each specified rotation angle of the plurality of specified rotation angles:
positioning characters of the target image under the specified rotation angle by using a preset text positioning algorithm to obtain a plurality of file image slices of the target image under the specified rotation angle; each of the document image slices contains a character;
and performing character recognition on the file image slices by using a preset text recognition algorithm to obtain a character recognition result of the target image at the specified rotation angle.
In an embodiment of the present specification, the method further includes:
and before word segmentation is carried out on the character recognition result, carrying out text cleaning on the character recognition result.
In an embodiment of this specification, the text cleaning on the text recognition result includes:
splicing all characters in the same character recognition result into a line of text;
and eliminating the non-Chinese characters in the line of text.
In one embodiment of the present specification, the text location algorithm comprises a CTPN text location algorithm.
In an embodiment of the present specification, the text recognition algorithm includes a CRNN text recognition algorithm.
In an embodiment of the present specification, the eliminating non-chinese characters in the line of text includes:
and eliminating all characters with the character bit length less than 2 in the line of text.
In an embodiment of the present specification, the non-chinese character includes: numbers, characters, and punctuation.
In an embodiment of this specification, the word segmentation on the text recognition result includes:
acquiring a character recognition result;
carrying out the ending word segmentation on the character recognition result to obtain the ending word segmentation result of the character recognition result;
and removing the single characters in the result of the Chinese character recognition, thereby forming a word segmentation set of the character recognition result.
In an embodiment of the present specification, the plurality of specified rotation angles includes:
0 degrees counterclockwise, 90 degrees counterclockwise, 180 degrees counterclockwise, and 270 degrees counterclockwise.
In an embodiment of the present specification, the plurality of specified rotation angles includes:
clockwise rotation of 0 degrees, clockwise rotation of 90 degrees, clockwise rotation of 180 degrees, and clockwise rotation of 270 degrees.
In an embodiment of the present specification, the target image includes any one or more of:
a contract image;
a certificate image;
a ticket image.
On the other hand, an embodiment of the present specification further provides an apparatus for determining a document image direction in OCR, including:
the image acquisition module is used for acquiring a target image;
the character recognition module is used for rotating the target image according to a plurality of specified rotation angles and respectively acquiring a character recognition result of the target image at each specified rotation angle;
the text word segmentation module is used for segmenting words of the character recognition result and correspondingly obtaining a plurality of word segmentation sets;
and the direction determining module is used for determining the participle set with the maximum word number in the participle sets, and taking the image direction corresponding to the participle set as the forward direction of the target image.
In another aspect, the embodiments of the present specification further provide a computer device, which includes a memory, a processor, and a computer program stored on the memory, and when the computer program is executed by the processor, the computer program executes the instructions of the above method.
In another aspect, the present specification further provides a computer storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for determining the document image orientation in OCR.
As can be seen from the technical solutions provided in the embodiments of the present specification, the text recognition results of the target image at different specified rotation angles can be obtained, then each text recognition result is subjected to word segmentation, and a plurality of word segmentation sets are correspondingly obtained; and determining a word segmentation set with the maximum number of words in the word segmentation sets, and taking a character recognition result corresponding to the word segmentation set as a character recognition result of the target image in the forward direction. Because the characters recognized by the forward document image are read most smoothly, the most smooth can be understood as the number of recognized words is the largest. Therefore, the embodiment of the specification realizes the automatic recognition of the forward direction of the document image, and the mode avoids the template limitation of the prior art, thereby improving the efficiency and the application range of judging the document image direction in the OCR.
Drawings
In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present specification, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort. In the drawings:
FIG. 1 is a flowchart illustrating a method for determining a document image orientation in OCR according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram illustrating an embodiment of determining a document image direction in OCR;
FIG. 3a is a block diagram of an apparatus for determining an image orientation of a document in OCR according to some embodiments provided herein;
FIG. 3b is a block diagram illustrating an apparatus for determining an image orientation of a document in OCR according to another embodiment of the present disclosure;
fig. 4 is a block diagram of an electronic device in an embodiment provided in this specification.
[ description of reference ]
31. An image acquisition module;
32. a character recognition module;
33. a text word segmentation module;
34. a direction determination module;
35. a text cleaning module;
402. a computer device;
404. a processor;
406. a memory;
408. a drive mechanism;
410. an input/output module;
412. an input device;
414. an output device;
416. a presentation device;
418. a graphical user interface;
420. a network interface;
422. a communication link;
424. a communication bus.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the present specification, and not all of the embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments in the present specification without any inventive step should fall within the scope of protection of the present specification.
The method for determining the direction of the file image in the OCR of the embodiment of the present specification can be applied to a computer device. Referring to fig. 1, in some embodiments of the present specification, the method for determining the document image direction in OCR may include the following steps:
and S101, acquiring a target image.
In the embodiments of the present specification, acquiring the target image may be reading one original image to be processed from the set of images to be processed. Due to different image acquisition modes (vertical shooting, horizontal shooting and the like) and different models of front-end acquisition equipment (such as a scanner, a single lens reflex camera, a smart phone and the like), part of original images in an image set are not in the forward direction. In some exemplary embodiments of the present description, the original image to be processed may be a contract image (e.g., a loan contract, a mortgage contract), a certificate image (e.g., an identification card image, a house property card image), a ticket image, and the like.
S102, rotating the target image according to a plurality of specified rotation angles, and respectively obtaining character recognition results of the target image at each specified rotation angle.
In some embodiments of the present specification, the acquiring a text recognition result of the target image at each specified rotation angle may include:
for each specified rotation angle of the plurality of specified rotation angles:
1) positioning the characters of the target image under the specified rotation angle by using a preset text positioning algorithm to obtain a plurality of file image slices of the target image under the specified rotation angle; each of the file image slices contains a character.
2) And performing character recognition on the file image slice by using a preset text recognition algorithm to obtain a character recognition result of the target image at the specified rotation angle.
In order to automatically recognize the forward direction of the document images, the currently acquired target image may be rotated according to a plurality of designated rotation angles, and the character recognition result of the target image at each designated rotation angle is acquired. In embodiments of the present description, text positioning may employ any suitable text positioning method.
For example, in an exemplary embodiment, a ctpn (detecting Text in natural Image with connectivity Text forward network) Text location algorithm may be used to perform Text location on the file Image at each specified rotation angle. The main idea of the CTPN text positioning algorithm is as follows: each character in a line is first framed to form multiple document image slices, and then all the character frames (i.e., all the document image slices) are merged to obtain a frame of a line of characters. The idea of fast Recurrent Neural Network (fast RNN for short) is used for reference when each character in a row is boxed. Specifically, a Convolutional Neural Network (CNN) is used for extracting features, then a plurality of anchor frames (anchors) are generated for each pixel point by using a feature map, the anchors which are the largest in coincidence with the real frames are responsible for prediction, then the predicted anchors of each character are sent to a Long Short-Term Memory Network (LSTM) to extract the features, then the features are sent to a full connection layer, and target confidence classification, transverse center point offset regression, and vertical center point and vertical length regression are sequentially performed. After the training is finished, adding the trained offset to the predicted anchor of each character to obtain the corrected predicted anchor of each character, and then combining all the corrected anchors to obtain a prediction frame of a line of characters.
Generally, when an original image is not correct, the original image is most often tilted by 90 °, 180 °, and 270 °. Thus, in some embodiments of the present description, the plurality of specified angles of rotation may include: four specified rotation angles of 0 degree of anticlockwise rotation, 90 degrees of anticlockwise rotation, 180 degrees of anticlockwise rotation and 270 degrees of anticlockwise rotation; or four specified rotation angles of clockwise rotation of 0 degrees, clockwise rotation of 90 degrees, clockwise rotation of 180 degrees and clockwise rotation of 270 degrees. Of course, those skilled in the art will understand that the illustration is only an example, and in other embodiments of the present disclosure, the number of the rotation angles may be set according to actual needs, and the present disclosure is not limited to this.
On the basis of realizing text positioning, the text recognition can be carried out on each positioning box by utilizing a preset text recognition algorithm. For example, in An exemplary embodiment, a CRNN (An End-to-End variable Network for Image-based Sequence registration and Its Application to Scene Text registration) Text Recognition algorithm may be used to perform Text Recognition on each of the positioning boxes, and the corresponding output result may be a list of recognized Text strings.
CRNN is an end-to-end indefinite length word recognition algorithm. It generally includes three-layer network structures of cnn (volumetric Neural network), rnn (current Neural network), and Connection Timing Classification (CTC). For ease of understanding, the structures containing CNN, RNN and CTC will be briefly described below.
(1) CNN structure
CNN (architecture using VGG16 backbone network, which contains 16 hidden layers (13 convolutional layers and 3 fully-connected layers) and its function is to extract features.
The feature scale obtained by the CNN network is 25 × 1 (neglecting the number of channels), so as to obtain a sequence, each column of features corresponds to a rectangular region of the original image, so that it is convenient to perform the next calculation as RNN input, and each feature has a one-to-one correspondence with the input.
(2) Structure of RNN
RNN is a characteristic sequence for the output of CNN x ═ x1,…,xpEach inputting xtAll have an output yt. To prevent the disappearance of the gradient during training, the article uses the LSTM (Long Short-Term Memory) Long Short-Term Memory neural unit as a unit of RNN. When a sequence is predicted, the sequence is predicted because both forward information and backward information of the sequence contribute to the prediction of the sequence. Thus, in some cases, the RNN may employ a bidirectional RNN network as needed.
(3) CTC translation layer
CTC is a sequence that converts the output of RNN into a string, whereas the converted input and output do not correspond in length and the input can be of different lengths.
Referring to fig. 2, in some embodiments of the present disclosure, before performing character recognition, the target image may be rotated according to each designated rotation angle to obtain document images at a plurality of designated rotation angles, and then character recognition may be performed on the document images at each designated rotation angle. For example, taking the above counterclockwise rotation of 0 °, 90 °, 180 °, and 270 ° as examples, the obtained target image is the document image rotated counterclockwise by 0 °, and the document images rotated counterclockwise by 90 °, 180 °, and 270 ° can be correspondingly obtained.
In other embodiments of this specification, after the target image is rotated according to each specified rotation angle, the document image at the specified rotation angle may be directly subjected to character recognition; and then, rotating the target image according to the next specified rotation angle, and performing character recognition on the file image at the specified rotation angle. And sequentially recursing until the direction of the file image is judged in the OCR at all the specified rotation angles. For example, taking the above counterclockwise rotations of 0 °, 90 °, 180 °, and 270 ° as examples, the obtained target image is the document image rotated counterclockwise by 0 °, and at this time, the document image rotated counterclockwise by 0 ° may be subjected to character recognition. And after the direction of the file image is judged in the OCR rotated by 0 degree anticlockwise, rotating the target image by 90 degrees anticlockwise and performing character recognition on the file image rotated by 90 degrees anticlockwise. And sequentially recursing until the direction of the file image is judged in the OCR rotating counterclockwise by 270 degrees.
S103, performing word segmentation on the character recognition result, and correspondingly obtaining a plurality of word segmentation sets.
In the embodiments of the present specification, the purpose of segmenting the word recognition result is to identify the forward direction of the target image by using the number of the obtained segmented words in the segmented word set. In some embodiments of the present specification, the word segmentation on the word recognition result may include: acquiring a character recognition result; carrying out the ending word segmentation on the character recognition result to obtain the ending word segmentation result of the character recognition result; and removing the single characters in the result of the Chinese character recognition, thereby forming a word segmentation set of the character recognition result. In some exemplary embodiments of the present specification, the word segmentation may specifically adopt a jieba (jieba) word segmentation algorithm, a yaha word segmentation algorithm, or a discoid word segmentation algorithm, etc. The present specification is not limited to this, and may specifically select the above-described examples as needed. For the result of judging the document image direction in the OCR at each specified rotation angle, a plurality of segmented words can be obtained by performing segmentation, and the segmented words form a segmented word set of the result of judging the document image direction in the OCR at the specified rotation angle. Thus, when N character recognition results exist, N word segmentation sets can be formed.
Referring to fig. 2, in some embodiments of the present disclosure, in order to facilitate word segmentation and avoid interference with the determination result, before performing word segmentation on the word recognition result, text cleaning may be performed on the word recognition result. For example, in an embodiment of the present specification, the text washing for the word recognition result may include the following steps:
1) and all characters in the same character recognition result are spliced into a line of text. I.e. a plurality of lines of characters arranged in matrix form are spliced into a line of text.
For example, in an exemplary embodiment, if the document image is an id card and the corresponding character recognition result is shown in table 1 below, after the document image is spliced into a line of text, the following table 2 may be used:
TABLE 1
Somebody of name king |
Gender male national Han |
1973, 7 months and 1 day of birth |
Residential address of the Western region XXXXX in Beijing City |
Citizen identity number 1101xxxx xxxx xxxx |
TABLE 2
Name king a certain gender of the male ethnic group Han born 1973, 7 months 1 day of the Suzhou of Beijing City citizen ID number 1101xxxx xxxx xxxx |
The text positioning method belongs to text line identification and can only position one line of text. Therefore, in the embodiments of the present specification, when a sentence extends through two or more lines, the words can be re-spliced together to facilitate subsequent word segmentation. For a document recognition result, it usually occurs that a plurality of words run through two or more lines. Therefore, all characters in the same character recognition result can be spliced into a line of text, and the method is favorable for maximally recognizing words.
2) And eliminating the non-Chinese character in the line of text.
Since non-Chinese characters (such as punctuation marks, underlines, letters and the like) do not participate in word segmentation, the non-Chinese characters can be removed before word segmentation in order to facilitate subsequent word segmentation and avoid interference on judgment results. In one embodiment of the present description, since one kanji character generally occupies two character bits (i.e., a character bit length of 2) and one non-kanji character generally occupies one character bit (i.e., a character bit length of 1), all characters in a line of text having a character bit length of less than 2 (i.e., all non-kanji characters) can be deleted. For example, in an exemplary embodiment, using Table 2 above as an example, after the non-Kanji characters in the line of text are culled, Kanji characters as shown in Table 3 may be formed.
TABLE 3
Name king a certain sex of male national Chinese birth year, month, day, and residence of Beijing City western city XXXX citizen number |
S104, determining the participle set with the largest word number in the participle sets, and taking the image direction corresponding to the participle set as the forward direction of the target image.
The main idea of judging the document image direction in OCR in the embodiments of the present specification is: the characters identified by the forward document image are considered to be the most smooth to read, and the most smooth can be understood as the most recognized words, while the documents identified by the upside-down document image or the document image rotated by 90 degrees are usually some rare words or characters, and after word segmentation, the characters are single words instead of words, so the number of words is small. It can be concluded that the forward direction of the document image should be the direction in which the number of recognized words is the largest.
For example, in an exemplary embodiment, taking the above counterclockwise rotations of 0 °, 90 °, 180 °, and 270 ° as examples, assuming that there are four document images P1 to P4 to be processed, the number of words in the participle set at each counterclockwise rotation thereof may be as shown in table 4 below for each document image.
TABLE 4
As can be seen from table 4, the document image P1 corresponds to the maximum number of words in the participle set when rotated 90 ° counterclockwise; the number of words in the corresponding word segmentation set is the largest under the condition that the document image P2 rotates anticlockwise by 0 degrees; the number of words in the corresponding word segmentation set is the largest under the condition that the document image P3 rotates by 270 degrees anticlockwise; the document image P4 corresponds to the word in the participle set with the largest number at 90 ° of counterclockwise rotation.
Therefore, the corresponding direction after the document image P1 has been rotated counterclockwise by 90 ° may be taken as the forward direction of the document image P1; the corresponding direction after rotating the file image P2 counterclockwise by 0 ° may be taken as the forward direction of the file image P2; the corresponding direction after rotating the file image P3 counterclockwise 270 ° may be taken as the forward direction of the file image P3; and the corresponding direction after rotating the document image P4 counterclockwise by 90 deg. may be taken as the forward direction of the document image P4.
Therefore, in the embodiment of the present specification, the character recognition results of the target image at different specified rotation angles can be obtained, then each character recognition result is subjected to word segmentation, and a plurality of word segmentation sets are correspondingly obtained; and determining a word segmentation set with the maximum number of words in the word segmentation sets, and taking a character recognition result corresponding to the word segmentation set as a character recognition result of the target image in the forward direction. Because the characters recognized by the forward document image are read most smoothly, the most smooth can be understood as the number of recognized words is the largest. Therefore, the embodiment of the specification realizes the automatic recognition of the forward direction of the document image, and the mode avoids the template limitation of the prior art, thereby improving the efficiency and the application range of judging the document image direction in the OCR.
Corresponding to the method, the embodiment of the present specification further provides a device for determining the document image direction in OCR, referring to fig. 3 a. In some embodiments of the present specification, the means for determining the document image direction in the OCR may include:
an image acquisition module 31, which may be used to acquire a target image;
the character recognition module 32 may be configured to rotate the target image according to a plurality of specified rotation angles, and respectively obtain a character recognition result of the target image at each specified rotation angle;
the text word segmentation module 33 may be configured to perform word segmentation on the character recognition result, and correspondingly obtain a plurality of word segmentation sets;
the direction determining module 34 may be configured to determine a word segmentation set with the largest number of words included in the plurality of word segmentation sets, and use an image direction corresponding to the word segmentation set as a forward direction of the target image.
In the device of the embodiment of the present specification, the text recognition module may obtain text recognition results of the target image at different specified rotation angles, and the text segmentation module may segment each text recognition result to correspondingly obtain a plurality of segmentation sets; the direction determining module may determine that the multiple word segmentation sets include a word segmentation set with the largest number of words, and use a character recognition result corresponding to the word segmentation set as a character recognition result of the target image in the forward direction. Because the characters recognized by the forward document image are read most smoothly, the most smooth can be understood as the number of recognized words is the largest. Therefore, the embodiment of the specification realizes the automatic recognition of the forward direction of the document image, and the mode avoids the template limitation of the prior art, thereby improving the efficiency and the application range of judging the document image direction in the OCR.
In an embodiment of this specification, the acquiring a text recognition result of the target image at each specified rotation angle includes:
for each specified rotation angle of the plurality of specified rotation angles:
positioning characters of the target image under the specified rotation angle by using a preset text positioning algorithm to obtain a plurality of file image slices of the target image under the specified rotation angle; each of the document image slices contains a character;
and performing character recognition on the file image slices by using a preset text recognition algorithm to obtain a character recognition result of the target image at the specified rotation angle.
In some embodiments of the present specification, referring to fig. 3b, the apparatus for determining the document image direction in OCR may further include a text washing module 35. The text cleaning module 35 may be configured to clean the text of the character recognition result before the text segmentation module 33 segments the character of the character recognition result.
In some embodiments of the present specification, the text washing the text recognition result may include:
splicing all characters in the same character recognition result into a line of text;
and eliminating the non-Chinese characters in the line of text.
In some embodiments of the present description, the text location algorithm may comprise a CTPN text location algorithm.
In some embodiments of the present description, the text recognition algorithm may include a CRNN text recognition algorithm.
In some embodiments of the present specification, said rejecting non-chinese characters in the line of text may include:
and eliminating all characters with the character bit length less than 2 in the line of text.
In some embodiments of the present description, the non-kanji character comprises: numbers, characters, and punctuation.
In some embodiments of the present specification, the word segmentation on the text recognition result may include:
acquiring a character recognition result;
carrying out the ending word segmentation on the character recognition result to obtain the ending word segmentation result of the character recognition result;
and removing the single characters in the result of the Chinese character recognition, thereby forming a word segmentation set of the character recognition result.
In some embodiments of the present description, the plurality of specified rotation angles may include:
0 degrees counterclockwise, 90 degrees counterclockwise, 180 degrees counterclockwise, and 270 degrees counterclockwise.
In some embodiments of the present description, the plurality of specified rotation angles may include:
clockwise rotation of 0 degrees, clockwise rotation of 90 degrees, clockwise rotation of 180 degrees, and clockwise rotation of 270 degrees.
In some embodiments of the present description, the target image may include any one or more of:
a contract image;
a certificate image;
a ticket image.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functions of the various elements may be implemented in the same one or more software and/or hardware implementations of the present description.
Corresponding to the method, the embodiment of the specification further provides a computer device. As shown in FIG. 4, in some embodiments of the present description, the computer device 402 may include one or more processors 404, such as one or more Central Processing Units (CPUs) or Graphics Processors (GPUs), each of which may implement one or more hardware threads. The computer device 402 may also comprise any memory 406 for storing any kind of information, such as code, settings, data, etc., and in a particular embodiment a computer program running on the memory 406 and on the processor 404, which computer program, when executed by the processor 404, may perform the instructions according to the above-described method. For example, and without limitation, memory 406 may include any one or more of the following in combination: any type of RAM, any type of ROM, flash memory devices, hard disks, optical disks, etc. More generally, any memory may use any technology to store information. Further, any memory may provide volatile or non-volatile retention of information. Further, any memory may represent fixed or removable components of computer device 402. In one case, when the processor 404 executes the associated instructions, which are stored in any memory or combination of memories, the computer device 402 can perform any of the operations of the associated instructions. The computer device 402 also includes one or more drive mechanisms 408, such as a hard disk drive mechanism, an optical disk drive mechanism, etc., for interacting with any memory.
In correspondence with the method described above, embodiments of the present specification also provide a computer storage medium on which a computer program is stored, which when executed by a processor can implement the method for determining the orientation of a document image in OCR described above. Referring to fig. 4, in some embodiments of the present specification, the computer storage medium may be an electronic device, and includes a memory, a processor, and a computer program stored in the memory, and when the computer program is executed by the processor, the computer program may perform the method for determining the image direction of the file in the OCR.
While the process flows described above include operations that occur in a particular order, it should be appreciated that the processes may include more or less operations that are performed sequentially or in parallel (e.g., using parallel processors or a multi-threaded environment).
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processor to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processor, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processor to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processor to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
As will be appreciated by one skilled in the art, embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.
The embodiments of this specification may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The described embodiments may also be practiced in distributed computing environments where tasks are performed by remote processors that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of an embodiment of the specification. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.
Claims (15)
1. A method for judging the direction of a file image in OCR (optical character recognition), which is characterized by comprising the following steps:
acquiring a target image;
rotating the target image according to a plurality of specified rotation angles, and respectively obtaining a character recognition result of the target image at each specified rotation angle;
performing word segmentation on the character recognition result to correspondingly obtain a plurality of word segmentation sets;
and determining the participle set with the most word number in the participle sets, and taking the image direction corresponding to the participle set as the forward direction of the target image.
2. The method for determining the direction of a document image in OCR as recited in claim 1, wherein the acquiring of the character recognition result of the target image at each designated rotation angle comprises:
for each specified rotation angle of the plurality of specified rotation angles:
positioning characters of the target image under the specified rotation angle by using a preset text positioning algorithm to obtain a plurality of file image slices of the target image under the specified rotation angle; each of the document image slices contains a character;
and performing character recognition on the file image slices by using a preset text recognition algorithm to obtain a character recognition result of the target image at the specified rotation angle.
3. The method for determining a direction of a document image in OCR as recited in claim 1, further comprising:
and before word segmentation is carried out on the character recognition result, carrying out text cleaning on the character recognition result.
4. The method for determining the direction of a document image in OCR as recited in claim 3, wherein said text cleaning the character recognition result comprises:
splicing all characters in the same character recognition result into a line of text;
and eliminating the non-Chinese characters in the line of text.
5. The method for determining a direction of an image in a document in OCR as recited in claim 2 wherein said text localization algorithm comprises a CTPN text localization algorithm.
6. A method of determining a direction of a document image in OCR as recited in claim 2 wherein said text recognition algorithm comprises the CRNN text recognition algorithm.
7. The method for determining document image orientation in OCR as recited in claim 4 wherein said culling non-chinese characters in the line of text comprises:
and eliminating all characters with the character bit length less than 2 in the line of text.
8. The method of determining a direction of a document image in OCR as recited in claim 4, wherein the non-chinese character comprises: numbers, characters, and punctuation.
9. The method for determining the direction of a document image in OCR as recited in claim 1, wherein said segmenting said text recognition result comprises:
acquiring a character recognition result;
carrying out the ending word segmentation on the character recognition result to obtain the ending word segmentation result of the character recognition result;
and removing the single characters in the result of the Chinese character recognition, thereby forming a word segmentation set of the character recognition result.
10. The method of determining a direction of a document image in OCR as recited in claim 1, wherein the plurality of designated rotation angles includes:
0 degrees counterclockwise, 90 degrees counterclockwise, 180 degrees counterclockwise, and 270 degrees counterclockwise.
11. The method of determining a direction of a document image in OCR as recited in claim 1, wherein the plurality of designated rotation angles includes:
clockwise rotation of 0 degrees, clockwise rotation of 90 degrees, clockwise rotation of 180 degrees, and clockwise rotation of 270 degrees.
12. A method of determining the orientation of a document image in OCR as recited in claim 1 in which the target image includes any one or more of:
a contract image;
a certificate image;
a ticket image.
13. An apparatus for determining an orientation of an image of a document in an OCR, comprising:
the image acquisition module is used for acquiring a target image;
the character recognition module is used for rotating the target image according to a plurality of specified rotation angles and respectively acquiring a character recognition result of the target image at each specified rotation angle;
the text word segmentation module is used for segmenting words of the character recognition result and correspondingly obtaining a plurality of word segmentation sets;
and the direction determining module is used for determining the participle set with the maximum word number in the participle sets, and taking the image direction corresponding to the participle set as the forward direction of the target image.
14. A computer device comprising a memory, a processor, and a computer program stored on the memory, wherein the computer program, when executed by the processor, performs the instructions of the method of any one of claims 1-12.
15. A computer storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor of a computer device, executes instructions of a method according to any one of claims 1-12.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010869885.4A CN112052849A (en) | 2020-08-26 | 2020-08-26 | Method, device and equipment for judging file image direction in OCR (optical character recognition) |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010869885.4A CN112052849A (en) | 2020-08-26 | 2020-08-26 | Method, device and equipment for judging file image direction in OCR (optical character recognition) |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112052849A true CN112052849A (en) | 2020-12-08 |
Family
ID=73600219
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010869885.4A Pending CN112052849A (en) | 2020-08-26 | 2020-08-26 | Method, device and equipment for judging file image direction in OCR (optical character recognition) |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112052849A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112818983A (en) * | 2021-01-22 | 2021-05-18 | 常州友志自动化科技有限公司 | Method for judging character inversion by using picture acquaintance |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH11338973A (en) * | 1998-05-22 | 1999-12-10 | Fujitsu Ltd | Method and device for document picture correction |
CN103870799A (en) * | 2012-12-17 | 2014-06-18 | 北京千橡网景科技发展有限公司 | Character direction judging method and device |
JP2018116424A (en) * | 2017-01-17 | 2018-07-26 | 富士ゼロックス株式会社 | Image processing device and program |
CN111353491A (en) * | 2020-03-12 | 2020-06-30 | 中国建设银行股份有限公司 | Character direction determining method, device, equipment and storage medium |
-
2020
- 2020-08-26 CN CN202010869885.4A patent/CN112052849A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH11338973A (en) * | 1998-05-22 | 1999-12-10 | Fujitsu Ltd | Method and device for document picture correction |
CN103870799A (en) * | 2012-12-17 | 2014-06-18 | 北京千橡网景科技发展有限公司 | Character direction judging method and device |
JP2018116424A (en) * | 2017-01-17 | 2018-07-26 | 富士ゼロックス株式会社 | Image processing device and program |
CN111353491A (en) * | 2020-03-12 | 2020-06-30 | 中国建设银行股份有限公司 | Character direction determining method, device, equipment and storage medium |
Non-Patent Citations (1)
Title |
---|
张兴全;叶西宁;: "基于旋转变量的任意方向文本检测算法", 计算机工程与设计, no. 05, 16 May 2020 (2020-05-16) * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112818983A (en) * | 2021-01-22 | 2021-05-18 | 常州友志自动化科技有限公司 | Method for judging character inversion by using picture acquaintance |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3591582B1 (en) | Method and system for automatic object annotation using deep network | |
CN110569878B (en) | Photograph background similarity clustering method based on convolutional neural network and computer | |
CN110659647B (en) | Seal image identification method and device, intelligent invoice identification equipment and storage medium | |
CN110827247B (en) | Label identification method and device | |
US9727775B2 (en) | Method and system of curved object recognition using image matching for image processing | |
US9235759B2 (en) | Detecting text using stroke width based text detection | |
CN101908136B (en) | Table identifying and processing method and system | |
CN111583097A (en) | Image processing method, image processing device, electronic equipment and computer readable storage medium | |
CN109766778A (en) | Invoice information input method, device, equipment and storage medium based on OCR technique | |
WO2020097909A1 (en) | Text detection method and apparatus, and storage medium | |
WO2018233055A1 (en) | Method and apparatus for entering policy information, computer device and storage medium | |
Dubská et al. | Real-time precise detection of regular grids and matrix codes | |
US11341605B1 (en) | Document rectification via homography recovery using machine learning | |
US11481683B1 (en) | Machine learning models for direct homography regression for image rectification | |
CN112560861A (en) | Bill processing method, device, equipment and storage medium | |
CN110097059B (en) | Document image binarization method, system and device based on generation countermeasure network | |
CN113496208B (en) | Video scene classification method and device, storage medium and terminal | |
CN112686257A (en) | Storefront character recognition method and system based on OCR | |
CN113158895A (en) | Bill identification method and device, electronic equipment and storage medium | |
EP4369286A1 (en) | Shadow elimination device and method, empty disk recognition device and method | |
CN112052702A (en) | Method and device for identifying two-dimensional code | |
CN112052849A (en) | Method, device and equipment for judging file image direction in OCR (optical character recognition) | |
CN113112567A (en) | Method and device for generating editable flow chart, electronic equipment and storage medium | |
CN111612063A (en) | Image matching method, device and equipment and computer readable storage medium | |
CN108133205B (en) | Method and device for copying text content in image |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20220913 Address after: 25 Financial Street, Xicheng District, Beijing 100033 Applicant after: CHINA CONSTRUCTION BANK Corp. Address before: 25 Financial Street, Xicheng District, Beijing 100033 Applicant before: CHINA CONSTRUCTION BANK Corp. Applicant before: Jianxin Financial Science and Technology Co.,Ltd. |