CN112101317B - Page direction identification method, device, equipment and computer readable storage medium - Google Patents

Page direction identification method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN112101317B
CN112101317B CN202011282095.2A CN202011282095A CN112101317B CN 112101317 B CN112101317 B CN 112101317B CN 202011282095 A CN202011282095 A CN 202011282095A CN 112101317 B CN112101317 B CN 112101317B
Authority
CN
China
Prior art keywords
image block
character
target image
determining
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011282095.2A
Other languages
Chinese (zh)
Other versions
CN112101317A (en
Inventor
高超
徐国强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Financial Technology Co Ltd Shanghai
Original Assignee
OneConnect Financial Technology Co Ltd Shanghai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Financial Technology Co Ltd Shanghai filed Critical OneConnect Financial Technology Co Ltd Shanghai
Priority to CN202011282095.2A priority Critical patent/CN112101317B/en
Publication of CN112101317A publication Critical patent/CN112101317A/en
Application granted granted Critical
Publication of CN112101317B publication Critical patent/CN112101317B/en
Priority to PCT/CN2021/127179 priority patent/WO2022105569A1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of image processing, and discloses a page direction identification method, a device, equipment and a computer readable storage medium, wherein the method comprises the following steps: determining a target image to be subjected to image detection, and dividing the target image according to a preset cutting mode to obtain a plurality of image blocks; training each image block based on a preset convolutional neural network model, and determining whether each image block has characters and a target image block in the character direction based on the training result of the training; if a plurality of target image blocks with characters and character directions exist in each image block, classifying and summarizing the character directions of each target image block, determining the target character direction based on the classified and summarized result, and taking the target character direction as the page direction of the target image. The invention improves the accuracy of identifying the image page direction.

Description

Page direction identification method, device, equipment and computer readable storage medium
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to a method, an apparatus, a device, and a computer-readable storage medium for identifying a page direction.
Background
The OCR (Optical Character Recognition) technology can convert printed characters in an image into a text format which can be processed by a computer, is widely applied to scenes such as data entry, verification comparison and the like, and becomes a key link for informatization and digitization of various industries of national economy. OCR mainly solves two problems of position detection and content recognition of characters in pictures. Due to different acquisition modes (such as photographing and scanning), the image to be recognized may cause a rotation of 90, 180 or 270 degrees in the page direction, and the image to be recognized cannot be directly input into the OCR system, and usually needs to detect the page direction of the image and perform rotation correction. The conventional page direction is usually determined by estimating the position and direction of a text line by using methods such as morphology, line detection, projection and the like, but the image may be reversed by 180 degrees, and is also easily interfered by background texture lines outside the page. Although the whole image can be directly classified and the direction of the image can be predicted by using a deep learning technology, the model has a large requirement on the training data volume, is also easily interfered by background textures, and has low robustness.
Disclosure of Invention
The invention mainly aims to provide a page direction identification method, a page direction identification device, page direction identification equipment and a computer readable storage medium, and aims to solve the technical problem of how to improve the accuracy of image page direction identification.
In order to achieve the above object, the present invention provides a page direction identification method, including:
determining a target image to be subjected to image detection, and dividing the target image according to a preset cutting mode to obtain a plurality of image blocks;
training each image block based on a preset convolutional neural network model, and determining whether each image block has characters and a target image block in the character direction based on the training result of the training;
if a plurality of target image blocks with characters and character directions exist in each image block, classifying and summarizing the character directions of each target image block, determining the target character direction based on the classified and summarized result, and taking the target character direction as the page direction of the target image.
Optionally, the step of determining whether each of the image blocks has a character and a target image block with a character direction based on the training result of the training includes:
traversing each image block based on the training result of the training, and acquiring the whole area of the traversed image block and the area occupied by the suspected character area in the traversed image block;
calculating a ratio value of the area occupied by the suspected character area to the whole area, and judging whether the ratio value is larger than a preset threshold value or not;
if the ratio value is larger than a preset threshold value, determining that characters exist in the traversed image block, determining the character direction of the characters in the traversed image block according to the training result, and taking the traversed image block as a target image block.
Optionally, the step of determining the character direction of the characters in the traversed image block according to the training result includes:
and determining a label result corresponding to the traversed image block according to the training result, matching the label result with a preset label direction comparison table, and determining the character direction of characters in the traversed image block according to the matching result.
Optionally, before the step of training each image block based on a preset convolutional neural network model, the method includes:
inputting a plurality of initial image blocks in a preset mapping comparison table into an original convolutional neural network model for training so as to obtain text information of each initial image block, and comparing each text information with label information corresponding to each initial image block in the preset mapping comparison table;
and if the comparison fails, determining errors of the text information and the labeled information, and optimizing the original convolutional neural network model according to the errors to obtain a preset convolutional neural network model.
Optionally, the step of classifying and summarizing the character direction of each target image block, and determining the target character direction based on the classified and summarized result includes:
classifying and summarizing the character directions of the target image blocks to obtain a plurality of initial character directions, determining the number of target image blocks corresponding to the initial character directions, and taking the initial character direction with the largest number of target image blocks in each initial character direction as the target character direction.
Optionally, the step of dividing the target image according to a preset clipping manner to obtain a plurality of image blocks includes:
determining an origin in the target image, determining the length and the width of an image block to be divided based on the origin and a preset cutting mode, and dividing the target image according to the length and the width to obtain a plurality of image blocks.
Optionally, the step of determining the length and the width of the image block to be divided based on the origin and a preset clipping manner includes:
acquiring an initial length and an initial width of the target image, and determining the length and the width of the image block to be divided based on the origin, the initial length and the initial width, wherein the length is less than or equal to the initial length, and the width is less than or equal to the initial width.
In addition, to achieve the above object, the present invention further provides a page direction recognition apparatus, including:
the image processing device comprises a dividing module, a processing module and a processing module, wherein the dividing module is used for determining a target image to be subjected to image detection and dividing the target image according to a preset cutting mode to obtain a plurality of image blocks;
the determining module is used for training each image block based on a preset convolutional neural network model and determining whether each image block has characters and target image blocks in the character direction based on the training result of the training;
and the classification and collection module is used for classifying and collecting the character direction of each target image block if a plurality of target image blocks with characters and character directions exist in each image block, determining the target character direction based on the classification and collection result of the classification and collection, and taking the target character direction as the page direction of the target image.
In addition, in order to achieve the above object, the present invention also provides a page direction identification device;
the page direction recognition apparatus includes: a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein:
the computer program, when being executed by the processor, realizes the steps of the page direction identification method as described above.
In addition, to achieve the above object, the present invention also provides a computer-readable storage medium;
the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the page direction identification method as described above.
The method comprises the steps of determining a target image to be subjected to image detection, and dividing the target image according to a preset cutting mode to obtain a plurality of image blocks; training each image block based on a preset convolutional neural network model, and determining whether each image block has characters and a target image block in the character direction based on the training result of the training; if a plurality of target image blocks with characters and character directions exist in each image block, classifying and summarizing the character directions of each target image block, determining the target character direction based on the classified and summarized result, and taking the target character direction as the page direction of the target image. The target image is divided according to a preset cutting mode to obtain a plurality of image blocks, each image block is trained according to a convolutional neural network model to determine the target image block, the character directions of each target image block are classified and summarized, and the page direction of the target image is determined according to the classified and summarized result, so that the phenomenon that the page direction of the target image is estimated inaccurately in the prior art is avoided, and the accuracy of image page direction identification is improved.
Drawings
FIG. 1 is a schematic structural diagram of a page direction identification device of a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a page direction identification method according to a first embodiment of the present invention;
fig. 3 is a schematic diagram of functional modules of the page direction recognition apparatus according to the present invention.
The objects, features and advantages of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, fig. 1 is a schematic structural diagram of a page direction identifying device of a hardware operating environment according to an embodiment of the present invention.
As shown in fig. 1, the page direction recognition apparatus may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.
Optionally, the page direction identifying device may further include a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a WiFi module, and the like. Such as light sensors, motion sensors, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display screen according to the brightness of ambient light. Of course, the page direction recognition device may also be configured with other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which are not described herein again.
It will be understood by those skilled in the art that the structure of the page direction identifying apparatus shown in fig. 1 does not constitute a limitation of the page direction identifying apparatus, and may include more or less components than those shown, or some components in combination, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a page direction identification program.
In the page direction identification device shown in fig. 1, the network interface 1004 is mainly used for connecting to a background server and performing data communication with the background server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be configured to call the page direction identification program stored in the memory 1005 and execute the page direction identification method provided by the embodiment of the present invention.
Referring to fig. 2, the present invention provides a page direction identification method, in an embodiment of the page direction identification method, the page direction identification method includes the following steps:
step S10, determining a target image to be subjected to image detection, and dividing the target image according to a preset cutting mode to obtain a plurality of image blocks;
in this embodiment, a target image to be subjected to image document page detection is cut into a plurality of patches (small blocks), that is, image blocks, and each image block is predicted by using a convolutional neural network model in deep learning to determine whether characters exist in each image block, if so, the direction of the characters is continuously determined to obtain the prediction result of each image block, and then, the prediction results are summarized and fused to obtain the direction of the whole document page in the target image. Therefore, a target image to be subjected to image detection needs to be determined first, and the manner of determining the target image may be to acquire an image input by a user and use the image as the target image, or to acquire an image sent by another terminal and use the image as the target image, or to use an image generated by the terminal itself subjected to image detection as the target image, and the specific manner of acquiring the target image is not limited herein and may be set according to the needs of the user.
After the target image is obtained, the target image needs to be divided by using a preset clipping manner to obtain a plurality of image blocks, and it should be noted that, in order to ensure the continuity of each image block, when the target image is divided, it is necessary to ensure that a part of overlap exists between adjacent image blocks in each image block, that is, a part of regions exist between two adjacent image blocks are completely the same, the preset clipping manner may be to determine an origin of the target image, for example, to use the upper left corner of the target image as the origin, and construct a two-dimensional coordinate system based on the origin, where the x axis and the y axis in the two-dimensional coordinate system may be determined based on the edge length and the edge width of the target image. And dividing the target image, for example, dividing all regions of [ i × stride, j × stride, i × stride + size, j × stride + size ] in the target image (the 4 numbers respectively represent x and y coordinates at the upper left and lower right of the patch), wherein i and j are positive integers, and i × stride + size < = width and j × stride + size < = height are ensured. In the present embodiment, stride =192 and size =256 may be preferably set.
Step S20, training each image block based on a preset convolutional neural network model, and determining whether each image block has characters and target image blocks in the character direction based on the training result of the training;
after the plurality of image blocks are obtained, each image block can be trained by adopting a preset convolutional neural network model, namely, each image block is combined into a batch and input to the convolutional neural network model for training, so that a target image block with characters in each image block and a character direction corresponding to each target image block are determined according to a training result. That is, each image block is predicted through the convolutional neural network model, whether each image block contains characters or not can be determined according to the prediction result, and if the image block contains the characters, the direction of the characters is determined based on the prediction result. The mode of detecting whether each image block contains characters may be to determine the overall area of each image block through a convolutional neural network, detect the area occupied by the suspected character area in each image block, and then detect the ratio of the area occupied by the suspected character area to the overall area, determine that characters exist in the traversed image block if the ratio of the traversed image block is greater than a preset threshold (any threshold set in advance by a user), and determine that no characters exist in the traversed image block if the ratio is less than or equal to the preset threshold. After the characters exist in the traversed image block, the direction of the characters in the traversed image block can be determined according to the prediction direction of the convolutional neural network model, so that the direction which can be predicted by the convolutional neural network model needs to be determined firstly, namely, the label result carried in the traversed image block is determined according to the training result of the convolutional neural network model, such as 0,1,2,3 and the like, and then the obtained label result is matched with a preset label direction comparison table, wherein the label direction comparison table is provided with the direction corresponding to each label, such as 0 corresponding to 0 degree, 1 corresponding to 90 degrees, 2 corresponding to 180 degrees, 3 corresponding to 270 degrees and the like. And determining the character direction of the characters in the traversed image block according to the matching result, for example, the convolutional neural network model can predict 4 directions, which are respectively represented by 0,1,2, and 3 at 0 degree, 90 degrees, 180 degrees, and 270 degrees, that is, if the tag result output by the convolutional neural network model is 1, the character direction in the image block can be determined to be 90 degrees.
Step S30, if there are multiple target image blocks with characters and character directions in each of the image blocks, classifying and summarizing the character directions of each of the target image blocks, determining a target character direction based on a classification and summarization result of the classification and summarization, and taking the target character direction as a page direction of the target image.
When a plurality of target image blocks with characters and character directions exist in each image block through judgment, the character directions corresponding to each target image block are obtained, and the character directions are classified and collected, for example, the target image blocks corresponding to 0 degree are collected, the target image blocks corresponding to 90 degrees are collected, the target image blocks corresponding to 180 degrees are collected, the target image blocks corresponding to 270 degrees are collected, and it is determined which target image block corresponding to which character direction is the most, the character direction is taken as the target character direction, and if the target image block corresponding to 90 degrees is the most, the 90 degrees can be taken as the target character direction, that is, the page direction of the target image. In the present proposal, the target image is divided into a plurality of image blocks, and each image block is detected to determine whether a character is included, and if so, the character direction of the character is determined, and the character direction is classified and summarized to determine the page direction of the target image.
In the embodiment, a target image to be subjected to image detection is determined, and the target image is divided according to a preset cutting mode to obtain a plurality of image blocks; training each image block based on a preset convolutional neural network model, and determining whether each image block has characters and a target image block in the character direction based on the training result of the training; if a plurality of target image blocks with characters and character directions exist in each image block, classifying and summarizing the character directions of each target image block, determining the target character direction based on the classified and summarized result, and taking the target character direction as the page direction of the target image. The target image is divided according to a preset cutting mode to obtain a plurality of image blocks, each image block is trained according to a convolutional neural network model to determine the target image block, the character directions of each target image block are classified and summarized, and the page direction of the target image is determined according to the classified and summarized result, so that the phenomenon that the page direction of the target image is estimated inaccurately in the prior art is avoided, and the accuracy of image page direction identification is improved.
Further, on the basis of the first embodiment of the present invention, a second embodiment of the page direction identification method of the present invention is provided, where this embodiment is step S20 of the first embodiment of the present invention, and the refinement of the step of determining whether each image block has characters and target image blocks in a character direction based on the training result of the training includes:
step a, traversing each image block based on the training result of the training, and acquiring the whole area of the traversed image block and the area occupied by the suspected character area in the traversed image block;
in this embodiment, when each image block is trained through the convolutional neural network model and a training result is obtained, each image block may be traversed according to the training result, and for the traversed image block, the whole area of the traversed image block and the area of a suspected character area in the traversed image block, that is, the area occupied by the suspected character area, are determined.
Step b, calculating a ratio value of the area occupied by the suspected character area to the whole area, and judging whether the ratio value is larger than a preset threshold value or not;
and after the overall area of the traversed image block and the area occupied by the suspected character area are obtained, calculating a ratio value of the area occupied by the suspected character area to the overall area, judging whether the ratio value is larger than a preset threshold value or not, and executing different operations based on different judgment results. The preset threshold may be any threshold set in advance by the user.
And c, if the ratio value is larger than a preset threshold value, determining that characters exist in the traversed image block, determining the character direction of the characters in the traversed image block according to the training result, and taking the traversed image block as a target image block.
And when the proportion value is judged to be larger than the preset threshold value, determining that the traversed image block has characters, and if the proportion value is smaller than or equal to the preset threshold value, determining that the traversed image block does not have characters. If the traversed image block has characters, the character direction of the characters in the traversed image block can be determined according to the training result, that is, the character direction of the characters in the traversed image block is determined according to the prediction direction of the convolutional neural network model, and at this time, the traversed image block can also be used as the target image block.
In the embodiment, the proportion value of the area occupied by the suspected character area in the traversed image block to the whole area is determined according to the training result, when the proportion value is larger than the preset threshold value, the characters in the traversed image block are determined, the character sending direction is determined according to the training result, and the traversed image block is used as the target image block, so that the accuracy of the obtained target image block is guaranteed.
Specifically, the step of determining the character direction of the characters in the traversed image block according to the training result includes:
and d, determining a label result corresponding to the traversed image block according to the training result, matching the label result with a preset label direction comparison table, and determining the character direction of characters in the traversed image block according to the matching result.
When determining the character direction of characters in an traversed image block, the direction which can be predicted by a convolutional neural network model needs to be determined first, that is, the label result carried in the traversed image block, such as 0,1,2,3, etc., is determined according to the training result of the convolutional neural network model training, and then the obtained label result is matched with a preset label direction comparison table, wherein the label direction comparison table is provided with the direction corresponding to each label, such as 0 corresponding to 0 degree, 1 corresponding to 90 degrees, 2 corresponding to 180 degrees, 3 corresponding to 270 degrees, etc. And determining the character direction of the characters in the traversed image block according to the matching result, for example, the convolutional neural network model can predict 4 directions, which are respectively represented by 0,1,2, and 3 at 0 degree, 90 degrees, 180 degrees, and 270 degrees, that is, if the tag result output by the convolutional neural network model is 1, the character direction in the image block can be determined to be 90 degrees.
In this embodiment, the label result corresponding to the traversed image block is determined according to the training result, and when the label result is matched with the label comparison table, the character direction is determined according to the matching result, so that the accuracy of the obtained character direction is ensured.
Further, before the step of training each image block based on a preset convolutional neural network model, the method includes:
step e, inputting a plurality of initial image blocks in a preset mapping comparison table into an original convolutional neural network model for training so as to obtain text information of each initial image block, and comparing each text information with label information corresponding to each initial image block in the preset mapping comparison table;
before each image block is trained by using a preset convolutional neural network model, a conventional convolutional neural network model, namely an original convolutional neural network model, needs to be obtained, and the preset convolutional neural network model is obtained by training and optimizing the original convolutional neural network model in advance, for example, by using a gradient descent method until the model converges. That is, the labeling information of each initial image block, such as whether there is text content, and the text direction of the text content, may be determined by manually labeling, and the initial image blocks and the labeling information are summarized to obtain the preset mapping comparison table. And after the preset mapping comparison table is obtained, performing model optimization on the original convolutional neural network model according to the preset mapping comparison table to obtain the convolutional neural network model. The method includes the steps of extracting a plurality of initial image blocks in a preset mapping comparison table, inputting each initial image block into an original convolutional neural network model as a batch for training, determining text information of each initial image block according to a training result, namely determining whether each initial image block contains text content according to the training result, and if so, determining the text direction of the text content in each initial image block. And comparing each text information with the label information corresponding to each initial image block in the preset mapping comparison table, that is, comparing the text information (including whether the text information has text content and text direction) of each initial image block with the label information (including whether the text information has text content and text direction) of each initial image block in the preset mapping comparison table.
And f, if the comparison fails, determining errors of the text information and the labeling information, and optimizing the original convolutional neural network model according to the errors to obtain a preset convolutional neural network model.
If the image blocks are inconsistent (namely the comparison fails), namely if the text information and the label information of a certain initial image block are different, the error of the text information and the label information needs to be determined, and model optimization is performed on the original convolutional neural network model according to the error, namely model parameters are adjusted. And performing model optimization on the original convolutional neural network model again in the same mode until the model is converged or the error is extremely small, and taking the convolutional neural network model at the moment as a preset convolutional neural network model.
In this embodiment, the original convolutional neural network model is trained according to each initial image block, an error of the original convolutional neural network model is determined when the comparison between the text information of each initial image block and the label information in the preset mapping comparison table fails, and the original convolutional neural network model is optimized according to the error to obtain the preset convolutional neural network model, so that the effectiveness of the obtained preset convolutional neural network model is ensured.
Further, the step of classifying and summarizing the character direction of each target image block, and determining the target character direction based on the classified and summarized result includes:
and step g, classifying and summarizing the character directions of the target image blocks to obtain a plurality of initial character directions, determining the number of the target image blocks corresponding to the initial character directions, and taking the initial character direction with the largest number of the target image blocks in each initial character direction as the target character direction.
In this embodiment, after the character directions of the respective target image blocks are obtained, the character directions need to be classified and summarized to obtain a plurality of initial character directions, such as 0 degree, 90 degree, 180 degree, and 270 degree. For example, the target image blocks corresponding to 0 degree are collected, the target image blocks corresponding to 90 degrees are collected, the target image blocks corresponding to 180 degrees are collected, the target image blocks corresponding to 270 degrees are collected, and it is determined which character direction corresponds to the largest number of target image blocks, then the character direction is taken as the target character direction, that is, the number of the target image blocks and the number of the target image blocks (that is, the number of the target image blocks) corresponding to each character side are determined, and the initial character direction with the largest number of the target image blocks in each initial character direction is taken as the target character direction. If the target image block corresponding to 90 degrees is the largest, the 90 degrees can be used as the target character direction, that is, the page direction of the target image.
In this embodiment, a plurality of initial character directions are obtained by classifying and summarizing the character directions of each target image block, and the initial character direction with the largest number of target image blocks in each initial character direction is taken as the target character direction, so that the accuracy of the obtained target character direction is ensured.
Further, the step of dividing the target image according to a preset clipping manner to obtain a plurality of image blocks includes:
and h, determining an origin in the target image, determining the length and the width of the image block to be divided based on the origin and a preset cutting mode, and dividing the target image according to the length and the width to obtain a plurality of image blocks.
In this embodiment, when dividing the target image according to a preset clipping manner, it is necessary to determine an origin point set in the target image, that is, an origin point of coordinates of a coordinate system, where a position of the origin point may be set based on a requirement of a user. After the origin is determined, a two-dimensional coordinate system can be created according to the initial length and the initial width of the target image, the x axis and the y axis are constructed according to the edges of the target image to complete construction of the two-dimensional coordinate system, after the two-dimensional coordinate system is constructed, the length and the width of an image block to be divided and coordinates of a division starting point need to be determined, and then coordinates of four image block vertexes of the image block are determined in the two-dimensional coordinate system according to the length, the width and the coordinates of the starting point, such as [ i ] stride, j _ stride, i _ stride + size, j _ stride + size ]. And dividing the target image block according to the vertex coordinates of the four image blocks to obtain the divided image blocks. The target image may be divided for a plurality of times to obtain a plurality of image blocks, and the obtaining mode of each image block may be the same.
In the embodiment, the length and the width of the image block to be divided are determined according to the origin and the cutting mode in the target image, and the target image is divided based on the length and the width to obtain a plurality of image blocks, so that the effectiveness of the obtained image blocks is guaranteed.
Specifically, the step of determining the length and width of the image block to be divided based on the origin and a preset clipping manner includes:
and k, acquiring the initial length and the initial width of the target image, and determining the length and the width of the image block to be divided based on the origin, the initial length and the initial width, wherein the length is less than or equal to the initial length, and the width is less than or equal to the initial width.
When determining the length and the width of the image block to be divided, the length of the target image, i.e., the initial length, and the width of the target image, i.e., the initial width, are acquired, a range capable of being divided is determined according to the origin, the initial width and the initial length, and the target image is divided according to a dividing instruction of a user to determine the length and the width of the image block to be divided. And each image block needs to satisfy the following conditions that the length of the image block to be divided is less than or equal to the initial length of the target image, the width of the image block to be divided is less than or equal to the initial width of the target image, and an overlapping part exists between adjacent image blocks.
In this embodiment, the length and the width of the image block to be divided are determined according to the initial length, the initial width and the origin of the target image, so that the effectiveness of the acquired image block to be divided is ensured.
In addition, referring to fig. 3, an embodiment of the present invention further provides a page direction identification apparatus, where the page direction identification apparatus includes:
the dividing module A10 is configured to determine a target image to be subjected to image detection, and divide the target image according to a preset clipping manner to obtain a plurality of image blocks;
the determining module A20 is used for training each image block based on a preset convolutional neural network model, and determining whether each image block has characters and target image blocks in the character direction based on the training result of the training;
and a classifying and summarizing module a30, configured to, if a plurality of target image blocks with characters and character directions exist in each image block, classify and summarize the character directions of each target image block, determine a target character direction based on a classifying and summarizing result of the classifying and summarizing, and use the target character direction as a page direction of the target image.
Further, the determining module a20 is further configured to:
traversing each image block based on the training result of the training, and acquiring the whole area of the traversed image block and the area occupied by the suspected character area in the traversed image block;
calculating a ratio value of the area occupied by the suspected character area to the whole area, and judging whether the ratio value is larger than a preset threshold value or not;
if the ratio value is larger than a preset threshold value, determining that characters exist in the traversed image block, determining the character direction of the characters in the traversed image block according to the training result, and taking the traversed image block as a target image block.
Further, the determining module a20 is further configured to:
and determining a label result corresponding to the traversed image block according to the training result, matching the label result with a preset label direction comparison table, and determining the character direction of characters in the traversed image block according to the matching result.
Further, the determining module a20 is further configured to:
inputting a plurality of initial image blocks in a preset mapping comparison table into an original convolutional neural network model for training so as to obtain text information of each initial image block, and comparing each text information with label information corresponding to each initial image block in the preset mapping comparison table;
and if the comparison fails, determining errors of the text information and the labeled information, and optimizing the original convolutional neural network model according to the errors to obtain a preset convolutional neural network model.
Further, the categorical summarizing module a30 is further configured to:
classifying and summarizing the character directions of the target image blocks to obtain a plurality of initial character directions, determining the number of target image blocks corresponding to the initial character directions, and taking the initial character direction with the largest number of target image blocks in each initial character direction as the target character direction.
Further, the dividing module a10 is further configured to:
determining an origin in the target image, determining the length and the width of an image block to be divided based on the origin and a preset cutting mode, and dividing the target image according to the length and the width to obtain a plurality of image blocks.
Further, the dividing module a10 is further configured to:
acquiring an initial length and an initial width of the target image, and determining the length and the width of the image block to be divided based on the origin, the initial length and the initial width, wherein the length is less than or equal to the initial length, and the width is less than or equal to the initial width.
The steps implemented by each functional module of the page direction identification apparatus may refer to each embodiment of the page direction identification method of the present invention, and are not described herein again.
The present invention also provides a page direction identification device, which includes: the device comprises a memory, a processor and a page direction identification program stored on the memory; the processor is used for executing the page direction identification program to realize the following steps:
determining a target image to be subjected to image detection, and dividing the target image according to a preset cutting mode to obtain a plurality of image blocks;
training each image block based on a preset convolutional neural network model, and determining whether each image block has characters and a target image block in the character direction based on the training result of the training;
if a plurality of target image blocks with characters and character directions exist in each image block, classifying and summarizing the character directions of each target image block, determining the target character direction based on the classified and summarized result, and taking the target character direction as the page direction of the target image.
The present invention also provides a computer-readable storage medium storing one or more programs, which are further executable by one or more processors for implementing the steps of the embodiments of the page direction identifying method described above.
The specific implementation of the computer-readable storage medium of the present invention is substantially the same as the embodiments of the page direction identification method, and is not described herein again.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (9)

1. A page direction identification method is characterized by comprising the following steps:
determining a target image to be subjected to image detection, and dividing the target image according to a preset cutting mode to obtain a plurality of image blocks, wherein partial areas of every two adjacent image blocks in each image block are completely the same;
training each image block based on a preset convolutional neural network model, and determining whether each image block has characters and a target image block in the character direction based on the training result of the training;
if a plurality of target image blocks with characters and character directions exist in each image block, classifying and summarizing the character directions of each target image block, determining the target character direction based on the classified and summarized result, and taking the target character direction as the page direction of the target image;
wherein, the step of determining whether each image block has characters and target image blocks with character directions based on the training result of the training comprises:
traversing each image block based on the training result of the training, and acquiring the whole area of the traversed image block and the area occupied by the suspected character area in the traversed image block;
calculating a ratio value of the area occupied by the suspected character area to the whole area, and judging whether the ratio value is larger than a preset threshold value or not;
if the ratio value is larger than a preset threshold value, determining that characters exist in the traversed image block, determining the character direction of the characters in the traversed image block according to the training result, and taking the traversed image block as a target image block.
2. The method for identifying page direction according to claim 1, wherein the step of determining the character direction of the characters in the traversed image block according to the training result comprises:
and determining a label result corresponding to the traversed image block according to the training result, matching the label result with a preset label direction comparison table, and determining the character direction of characters in the traversed image block according to the matching result.
3. The method for identifying page direction according to claim 1, wherein the step of training each image block based on the preset convolutional neural network model comprises:
inputting a plurality of initial image blocks in a preset mapping comparison table into an original convolutional neural network model for training so as to obtain text information of each initial image block, and comparing each text information with label information corresponding to each initial image block in the preset mapping comparison table;
and if the comparison fails, determining errors of the text information and the labeled information, and optimizing the original convolutional neural network model according to the errors to obtain a preset convolutional neural network model.
4. The method for identifying page direction according to claim 1, wherein the step of classifying and summarizing the character direction of each target image block and determining the target character direction based on the classified and summarized result comprises:
classifying and summarizing the character directions of the target image blocks to obtain a plurality of initial character directions, determining the number of target image blocks corresponding to the initial character directions, and taking the initial character direction with the largest number of target image blocks in each initial character direction as the target character direction.
5. The page direction recognition method according to any one of claims 1 to 4, wherein the step of dividing the target image according to a preset clipping manner to obtain a plurality of image blocks comprises:
determining an origin in the target image, determining the length and the width of an image block to be divided based on the origin and a preset cutting mode, and dividing the target image according to the length and the width to obtain a plurality of image blocks.
6. The page direction recognition method of claim 5, wherein the step of determining the length and width of the image block to be divided based on the origin and a preset clipping manner comprises:
acquiring an initial length and an initial width of the target image, and determining the length and the width of the image block to be divided based on the origin, the initial length and the initial width, wherein the length is less than or equal to the initial length, and the width is less than or equal to the initial width.
7. A page direction recognition apparatus, characterized in that the page direction recognition apparatus comprises:
the image detection device comprises a dividing module, a judging module and a judging module, wherein the dividing module is used for determining a target image to be subjected to image detection and dividing the target image according to a preset cutting mode to obtain a plurality of image blocks, and partial areas of every two adjacent image blocks in each image block are completely the same;
the determining module is used for training each image block based on a preset convolutional neural network model and determining whether each image block has characters and target image blocks in the character direction based on the training result of the training;
the classification and collection module is used for classifying and collecting the character direction of each target image block if a plurality of target image blocks with characters and character directions exist in each image block, determining the target character direction based on the classification and collection result of the classification and collection, and taking the target character direction as the page direction of the target image;
the determination module is further configured to traverse each image block based on the training result of the training, and obtain the whole area of the traversed image block and the area occupied by the suspected character area in the traversed image block; calculating a ratio value of the area occupied by the suspected character area to the whole area, and judging whether the ratio value is larger than a preset threshold value or not; if the ratio value is larger than a preset threshold value, determining that characters exist in the traversed image block, determining the character direction of the characters in the traversed image block according to the training result, and taking the traversed image block as a target image block.
8. A page direction recognition apparatus, characterized in that the page direction recognition apparatus comprises: memory, a processor and a page direction identification program stored on the memory and executable on the processor, the page direction identification program, when executed by the processor, implementing the steps of the page direction identification method according to any one of claims 1 to 6.
9. A computer-readable storage medium, characterized in that a page direction identification program is stored on the computer-readable storage medium, which when executed by a processor implements the steps of the page direction identification method according to any one of claims 1 to 6.
CN202011282095.2A 2020-11-17 2020-11-17 Page direction identification method, device, equipment and computer readable storage medium Active CN112101317B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011282095.2A CN112101317B (en) 2020-11-17 2020-11-17 Page direction identification method, device, equipment and computer readable storage medium
PCT/CN2021/127179 WO2022105569A1 (en) 2020-11-17 2021-10-28 Page direction recognition method and apparatus, and device and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011282095.2A CN112101317B (en) 2020-11-17 2020-11-17 Page direction identification method, device, equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN112101317A CN112101317A (en) 2020-12-18
CN112101317B true CN112101317B (en) 2021-02-19

Family

ID=73785712

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011282095.2A Active CN112101317B (en) 2020-11-17 2020-11-17 Page direction identification method, device, equipment and computer readable storage medium

Country Status (2)

Country Link
CN (1) CN112101317B (en)
WO (1) WO2022105569A1 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112101317B (en) * 2020-11-17 2021-02-19 深圳壹账通智能科技有限公司 Page direction identification method, device, equipment and computer readable storage medium
CN112766266B (en) * 2021-01-29 2021-12-10 云从科技集团股份有限公司 Text direction correction method, system and device based on staged probability statistics
CN112926564B (en) * 2021-02-25 2024-08-02 中国平安人寿保险股份有限公司 Picture analysis method, system, computer device and computer readable storage medium
CN113780131B (en) * 2021-08-31 2024-04-12 众安在线财产保险股份有限公司 Text image orientation recognition method, text content recognition method, device and equipment
CN114155546B (en) * 2022-02-07 2022-05-20 北京世纪好未来教育科技有限公司 Image correction method and device, electronic equipment and storage medium
CN115346205A (en) * 2022-10-17 2022-11-15 广州简悦信息科技有限公司 Page information identification method and device and electronic equipment
CN116935393A (en) * 2023-07-27 2023-10-24 中科微至科技股份有限公司 Method and system for extracting package surface information based on OCR technology

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106326854A (en) * 2016-08-19 2017-01-11 掌阅科技股份有限公司 Open fixed-layout document paragraph identification method
CN109845237A (en) * 2016-08-17 2019-06-04 惠普打印机韩国有限公司 The scan Image Correction Method and non-transitory computer readable recording medium of image forming apparatus, image forming apparatus
CN110490198A (en) * 2019-08-12 2019-11-22 上海眼控科技股份有限公司 Text orientation bearing calibration, device, computer equipment and storage medium
CN110942063A (en) * 2019-11-21 2020-03-31 望海康信(北京)科技股份公司 Certificate text information acquisition method and device and electronic equipment
CN111062374A (en) * 2019-12-10 2020-04-24 爱信诺征信有限公司 Identification method, device, system, equipment and readable medium of identity card information
CN111091124A (en) * 2019-12-04 2020-05-01 吉林大学 Spine character recognition method
CN111382740A (en) * 2020-03-13 2020-07-07 深圳前海环融联易信息科技服务有限公司 Text picture analysis method and device, computer equipment and storage medium
CN111507214A (en) * 2020-04-07 2020-08-07 中国人民财产保险股份有限公司 Document identification method, device and equipment
CN111639646A (en) * 2020-05-18 2020-09-08 山东大学 Test paper handwritten English character recognition method and system based on deep learning
CN111753850A (en) * 2020-06-29 2020-10-09 珠海奔图电子有限公司 Document processing method and device, computer equipment and computer readable storage medium
CN111814429A (en) * 2020-07-30 2020-10-23 深圳壹账通智能科技有限公司 Article typesetting method and device, terminal equipment and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101777124A (en) * 2010-01-29 2010-07-14 北京新岸线网络技术有限公司 Method for extracting video text message and device thereof
KR101214772B1 (en) * 2010-02-26 2012-12-21 삼성전자주식회사 Character recognition apparatus and method based on direction of character
US10546326B2 (en) * 2013-09-26 2020-01-28 Mark W. Publicover Providing targeted content based on a user's preferences
CN111144288A (en) * 2019-12-25 2020-05-12 联想(北京)有限公司 Image processing method and device and electronic equipment
CN111353491B (en) * 2020-03-12 2024-04-26 中国建设银行股份有限公司 Text direction determining method, device, equipment and storage medium
CN112101317B (en) * 2020-11-17 2021-02-19 深圳壹账通智能科技有限公司 Page direction identification method, device, equipment and computer readable storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109845237A (en) * 2016-08-17 2019-06-04 惠普打印机韩国有限公司 The scan Image Correction Method and non-transitory computer readable recording medium of image forming apparatus, image forming apparatus
CN106326854A (en) * 2016-08-19 2017-01-11 掌阅科技股份有限公司 Open fixed-layout document paragraph identification method
CN110490198A (en) * 2019-08-12 2019-11-22 上海眼控科技股份有限公司 Text orientation bearing calibration, device, computer equipment and storage medium
CN110942063A (en) * 2019-11-21 2020-03-31 望海康信(北京)科技股份公司 Certificate text information acquisition method and device and electronic equipment
CN111091124A (en) * 2019-12-04 2020-05-01 吉林大学 Spine character recognition method
CN111062374A (en) * 2019-12-10 2020-04-24 爱信诺征信有限公司 Identification method, device, system, equipment and readable medium of identity card information
CN111382740A (en) * 2020-03-13 2020-07-07 深圳前海环融联易信息科技服务有限公司 Text picture analysis method and device, computer equipment and storage medium
CN111507214A (en) * 2020-04-07 2020-08-07 中国人民财产保险股份有限公司 Document identification method, device and equipment
CN111639646A (en) * 2020-05-18 2020-09-08 山东大学 Test paper handwritten English character recognition method and system based on deep learning
CN111753850A (en) * 2020-06-29 2020-10-09 珠海奔图电子有限公司 Document processing method and device, computer equipment and computer readable storage medium
CN111814429A (en) * 2020-07-30 2020-10-23 深圳壹账通智能科技有限公司 Article typesetting method and device, terminal equipment and storage medium

Also Published As

Publication number Publication date
CN112101317A (en) 2020-12-18
WO2022105569A1 (en) 2022-05-27

Similar Documents

Publication Publication Date Title
CN112101317B (en) Page direction identification method, device, equipment and computer readable storage medium
CN109961009B (en) Pedestrian detection method, system, device and storage medium based on deep learning
CN108009543B (en) License plate recognition method and device
US10095925B1 (en) Recognizing text in image data
WO2019169772A1 (en) Picture processing method, electronic apparatus, and storage medium
CN109784181B (en) Picture watermark identification method, device, equipment and computer readable storage medium
CN111325104B (en) Text recognition method, device and storage medium
US20180225542A1 (en) Image information recognition processing method and device, and computer storage medium
CN112070076B (en) Text paragraph structure reduction method, device, equipment and computer storage medium
WO2021184718A1 (en) Card border recognition method, apparatus and device, and computer storage medium
CN108021863B (en) Electronic device, age classification method based on image and storage medium
CN110414649B (en) DM code positioning method, device, terminal and storage medium
CN112767354A (en) Defect detection method, device and equipment based on image segmentation and storage medium
CN112434612A (en) Smoking detection method and device, electronic equipment and computer readable storage medium
CN111080665B (en) Image frame recognition method, device, equipment and computer storage medium
CN115546809A (en) Table structure identification method based on cell constraint and application thereof
CN111582134A (en) Certificate edge detection method, device, equipment and medium
CN113657370B (en) Character recognition method and related equipment thereof
CN113657369B (en) Character recognition method and related equipment thereof
CN112560857B (en) Character area boundary detection method, equipment, storage medium and device
CN114399657A (en) Vehicle detection model training method and device, vehicle detection method and electronic equipment
CN113128604A (en) Page element identification method and device, electronic equipment and storage medium
CN109685069B (en) Image detection method, device and computer readable storage medium
CN115880362B (en) Code region positioning method, device, computer equipment and computer readable storage medium
CN115797955A (en) Table structure identification method based on cell constraint and application thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant