CN112101317B

CN112101317B - Page direction identification method, device, equipment and computer readable storage medium

Info

Publication number: CN112101317B
Application number: CN202011282095.2A
Authority: CN
Inventors: 高超; 徐国强
Original assignee: OneConnect Financial Technology Co Ltd Shanghai
Current assignee: OneConnect Financial Technology Co Ltd Shanghai
Priority date: 2020-11-17
Filing date: 2020-11-17
Publication date: 2021-02-19
Anticipated expiration: 2040-11-17
Also published as: CN112101317A; WO2022105569A1

Abstract

The invention relates to the technical field of image processing, and discloses a page direction identification method, a device, equipment and a computer readable storage medium, wherein the method comprises the following steps: determining a target image to be subjected to image detection, and dividing the target image according to a preset cutting mode to obtain a plurality of image blocks; training each image block based on a preset convolutional neural network model, and determining whether each image block has characters and a target image block in the character direction based on the training result of the training; if a plurality of target image blocks with characters and character directions exist in each image block, classifying and summarizing the character directions of each target image block, determining the target character direction based on the classified and summarized result, and taking the target character direction as the page direction of the target image. The invention improves the accuracy of identifying the image page direction.

Description

Page direction identification method, device, equipment and computer readable storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method, an apparatus, a device, and a computer-readable storage medium for identifying a page direction.

Background

The OCR (Optical Character Recognition) technology can convert printed characters in an image into a text format which can be processed by a computer, is widely applied to scenes such as data entry, verification comparison and the like, and becomes a key link for informatization and digitization of various industries of national economy. OCR mainly solves two problems of position detection and content recognition of characters in pictures. Due to different acquisition modes (such as photographing and scanning), the image to be recognized may cause a rotation of 90, 180 or 270 degrees in the page direction, and the image to be recognized cannot be directly input into the OCR system, and usually needs to detect the page direction of the image and perform rotation correction. The conventional page direction is usually determined by estimating the position and direction of a text line by using methods such as morphology, line detection, projection and the like, but the image may be reversed by 180 degrees, and is also easily interfered by background texture lines outside the page. Although the whole image can be directly classified and the direction of the image can be predicted by using a deep learning technology, the model has a large requirement on the training data volume, is also easily interfered by background textures, and has low robustness.

Disclosure of Invention

The invention mainly aims to provide a page direction identification method, a page direction identification device, page direction identification equipment and a computer readable storage medium, and aims to solve the technical problem of how to improve the accuracy of image page direction identification.

In order to achieve the above object, the present invention provides a page direction identification method, including:

determining a target image to be subjected to image detection, and dividing the target image according to a preset cutting mode to obtain a plurality of image blocks;

training each image block based on a preset convolutional neural network model, and determining whether each image block has characters and a target image block in the character direction based on the training result of the training;

if a plurality of target image blocks with characters and character directions exist in each image block, classifying and summarizing the character directions of each target image block, determining the target character direction based on the classified and summarized result, and taking the target character direction as the page direction of the target image.

Optionally, the step of determining whether each of the image blocks has a character and a target image block with a character direction based on the training result of the training includes:

traversing each image block based on the training result of the training, and acquiring the whole area of the traversed image block and the area occupied by the suspected character area in the traversed image block;

calculating a ratio value of the area occupied by the suspected character area to the whole area, and judging whether the ratio value is larger than a preset threshold value or not;

if the ratio value is larger than a preset threshold value, determining that characters exist in the traversed image block, determining the character direction of the characters in the traversed image block according to the training result, and taking the traversed image block as a target image block.

Optionally, the step of determining the character direction of the characters in the traversed image block according to the training result includes:

and determining a label result corresponding to the traversed image block according to the training result, matching the label result with a preset label direction comparison table, and determining the character direction of characters in the traversed image block according to the matching result.

Optionally, before the step of training each image block based on a preset convolutional neural network model, the method includes:

inputting a plurality of initial image blocks in a preset mapping comparison table into an original convolutional neural network model for training so as to obtain text information of each initial image block, and comparing each text information with label information corresponding to each initial image block in the preset mapping comparison table;

and if the comparison fails, determining errors of the text information and the labeled information, and optimizing the original convolutional neural network model according to the errors to obtain a preset convolutional neural network model.

Optionally, the step of classifying and summarizing the character direction of each target image block, and determining the target character direction based on the classified and summarized result includes:

classifying and summarizing the character directions of the target image blocks to obtain a plurality of initial character directions, determining the number of target image blocks corresponding to the initial character directions, and taking the initial character direction with the largest number of target image blocks in each initial character direction as the target character direction.

Optionally, the step of dividing the target image according to a preset clipping manner to obtain a plurality of image blocks includes:

determining an origin in the target image, determining the length and the width of an image block to be divided based on the origin and a preset cutting mode, and dividing the target image according to the length and the width to obtain a plurality of image blocks.

Optionally, the step of determining the length and the width of the image block to be divided based on the origin and a preset clipping manner includes:

acquiring an initial length and an initial width of the target image, and determining the length and the width of the image block to be divided based on the origin, the initial length and the initial width, wherein the length is less than or equal to the initial length, and the width is less than or equal to the initial width.

In addition, to achieve the above object, the present invention further provides a page direction recognition apparatus, including:

the image processing device comprises a dividing module, a processing module and a processing module, wherein the dividing module is used for determining a target image to be subjected to image detection and dividing the target image according to a preset cutting mode to obtain a plurality of image blocks;

the determining module is used for training each image block based on a preset convolutional neural network model and determining whether each image block has characters and target image blocks in the character direction based on the training result of the training;

and the classification and collection module is used for classifying and collecting the character direction of each target image block if a plurality of target image blocks with characters and character directions exist in each image block, determining the target character direction based on the classification and collection result of the classification and collection, and taking the target character direction as the page direction of the target image.

In addition, in order to achieve the above object, the present invention also provides a page direction identification device;

the page direction recognition apparatus includes: a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein:

the computer program, when being executed by the processor, realizes the steps of the page direction identification method as described above.

In addition, to achieve the above object, the present invention also provides a computer-readable storage medium;

the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the page direction identification method as described above.

The method comprises the steps of determining a target image to be subjected to image detection, and dividing the target image according to a preset cutting mode to obtain a plurality of image blocks; training each image block based on a preset convolutional neural network model, and determining whether each image block has characters and a target image block in the character direction based on the training result of the training; if a plurality of target image blocks with characters and character directions exist in each image block, classifying and summarizing the character directions of each target image block, determining the target character direction based on the classified and summarized result, and taking the target character direction as the page direction of the target image. The target image is divided according to a preset cutting mode to obtain a plurality of image blocks, each image block is trained according to a convolutional neural network model to determine the target image block, the character directions of each target image block are classified and summarized, and the page direction of the target image is determined according to the classified and summarized result, so that the phenomenon that the page direction of the target image is estimated inaccurately in the prior art is avoided, and the accuracy of image page direction identification is improved.

Drawings

FIG. 1 is a schematic structural diagram of a page direction identification device of a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a page direction identification method according to a first embodiment of the present invention;

fig. 3 is a schematic diagram of functional modules of the page direction recognition apparatus according to the present invention.

The objects, features and advantages of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, fig. 1 is a schematic structural diagram of a page direction identifying device of a hardware operating environment according to an embodiment of the present invention.

As shown in fig. 1, the page direction recognition apparatus may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.

Optionally, the page direction identifying device may further include a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a WiFi module, and the like. Such as light sensors, motion sensors, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display screen according to the brightness of ambient light. Of course, the page direction recognition device may also be configured with other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which are not described herein again.

It will be understood by those skilled in the art that the structure of the page direction identifying apparatus shown in fig. 1 does not constitute a limitation of the page direction identifying apparatus, and may include more or less components than those shown, or some components in combination, or a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a page direction identification program.

In the page direction identification device shown in fig. 1, the network interface 1004 is mainly used for connecting to a background server and performing data communication with the background server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be configured to call the page direction identification program stored in the memory 1005 and execute the page direction identification method provided by the embodiment of the present invention.

Referring to fig. 2, the present invention provides a page direction identification method, in an embodiment of the page direction identification method, the page direction identification method includes the following steps:

step S10, determining a target image to be subjected to image detection, and dividing the target image according to a preset cutting mode to obtain a plurality of image blocks;

in this embodiment, a target image to be subjected to image document page detection is cut into a plurality of patches (small blocks), that is, image blocks, and each image block is predicted by using a convolutional neural network model in deep learning to determine whether characters exist in each image block, if so, the direction of the characters is continuously determined to obtain the prediction result of each image block, and then, the prediction results are summarized and fused to obtain the direction of the whole document page in the target image. Therefore, a target image to be subjected to image detection needs to be determined first, and the manner of determining the target image may be to acquire an image input by a user and use the image as the target image, or to acquire an image sent by another terminal and use the image as the target image, or to use an image generated by the terminal itself subjected to image detection as the target image, and the specific manner of acquiring the target image is not limited herein and may be set according to the needs of the user.

After the target image is obtained, the target image needs to be divided by using a preset clipping manner to obtain a plurality of image blocks, and it should be noted that, in order to ensure the continuity of each image block, when the target image is divided, it is necessary to ensure that a part of overlap exists between adjacent image blocks in each image block, that is, a part of regions exist between two adjacent image blocks are completely the same, the preset clipping manner may be to determine an origin of the target image, for example, to use the upper left corner of the target image as the origin, and construct a two-dimensional coordinate system based on the origin, where the x axis and the y axis in the two-dimensional coordinate system may be determined based on the edge length and the edge width of the target image. And dividing the target image, for example, dividing all regions of [ i × stride, j × stride, i × stride + size, j × stride + size ] in the target image (the 4 numbers respectively represent x and y coordinates at the upper left and lower right of the patch), wherein i and j are positive integers, and i × stride + size < = width and j × stride + size < = height are ensured. In the present embodiment, stride =192 and size =256 may be preferably set.

Step S20, training each image block based on a preset convolutional neural network model, and determining whether each image block has characters and target image blocks in the character direction based on the training result of the training;

after the plurality of image blocks are obtained, each image block can be trained by adopting a preset convolutional neural network model, namely, each image block is combined into a batch and input to the convolutional neural network model for training, so that a target image block with characters in each image block and a character direction corresponding to each target image block are determined according to a training result. That is, each image block is predicted through the convolutional neural network model, whether each image block contains characters or not can be determined according to the prediction result, and if the image block contains the characters, the direction of the characters is determined based on the prediction result. The mode of detecting whether each image block contains characters may be to determine the overall area of each image block through a convolutional neural network, detect the area occupied by the suspected character area in each image block, and then detect the ratio of the area occupied by the suspected character area to the overall area, determine that characters exist in the traversed image block if the ratio of the traversed image block is greater than a preset threshold (any threshold set in advance by a user), and determine that no characters exist in the traversed image block if the ratio is less than or equal to the preset threshold. After the characters exist in the traversed image block, the direction of the characters in the traversed image block can be determined according to the prediction direction of the convolutional neural network model, so that the direction which can be predicted by the convolutional neural network model needs to be determined firstly, namely, the label result carried in the traversed image block is determined according to the training result of the convolutional neural network model, such as 0,1,2,3 and the like, and then the obtained label result is matched with a preset label direction comparison table, wherein the label direction comparison table is provided with the direction corresponding to each label, such as 0 corresponding to 0 degree, 1 corresponding to 90 degrees, 2 corresponding to 180 degrees, 3 corresponding to 270 degrees and the like. And determining the character direction of the characters in the traversed image block according to the matching result, for example, the convolutional neural network model can predict 4 directions, which are respectively represented by 0,1,2, and 3 at 0 degree, 90 degrees, 180 degrees, and 270 degrees, that is, if the tag result output by the convolutional neural network model is 1, the character direction in the image block can be determined to be 90 degrees.

Step S30, if there are multiple target image blocks with characters and character directions in each of the image blocks, classifying and summarizing the character directions of each of the target image blocks, determining a target character direction based on a classification and summarization result of the classification and summarization, and taking the target character direction as a page direction of the target image.

When a plurality of target image blocks with characters and character directions exist in each image block through judgment, the character directions corresponding to each target image block are obtained, and the character directions are classified and collected, for example, the target image blocks corresponding to 0 degree are collected, the target image blocks corresponding to 90 degrees are collected, the target image blocks corresponding to 180 degrees are collected, the target image blocks corresponding to 270 degrees are collected, and it is determined which target image block corresponding to which character direction is the most, the character direction is taken as the target character direction, and if the target image block corresponding to 90 degrees is the most, the 90 degrees can be taken as the target character direction, that is, the page direction of the target image. In the present proposal, the target image is divided into a plurality of image blocks, and each image block is detected to determine whether a character is included, and if so, the character direction of the character is determined, and the character direction is classified and summarized to determine the page direction of the target image.

In the embodiment, a target image to be subjected to image detection is determined, and the target image is divided according to a preset cutting mode to obtain a plurality of image blocks; training each image block based on a preset convolutional neural network model, and determining whether each image block has characters and a target image block in the character direction based on the training result of the training; if a plurality of target image blocks with characters and character directions exist in each image block, classifying and summarizing the character directions of each target image block, determining the target character direction based on the classified and summarized result, and taking the target character direction as the page direction of the target image. The target image is divided according to a preset cutting mode to obtain a plurality of image blocks, each image block is trained according to a convolutional neural network model to determine the target image block, the character directions of each target image block are classified and summarized, and the page direction of the target image is determined according to the classified and summarized result, so that the phenomenon that the page direction of the target image is estimated inaccurately in the prior art is avoided, and the accuracy of image page direction identification is improved.

Further, on the basis of the first embodiment of the present invention, a second embodiment of the page direction identification method of the present invention is provided, where this embodiment is step S20 of the first embodiment of the present invention, and the refinement of the step of determining whether each image block has characters and target image blocks in a character direction based on the training result of the training includes:

step a, traversing each image block based on the training result of the training, and acquiring the whole area of the traversed image block and the area occupied by the suspected character area in the traversed image block;

in this embodiment, when each image block is trained through the convolutional neural network model and a training result is obtained, each image block may be traversed according to the training result, and for the traversed image block, the whole area of the traversed image block and the area of a suspected character area in the traversed image block, that is, the area occupied by the suspected character area, are determined.

Step b, calculating a ratio value of the area occupied by the suspected character area to the whole area, and judging whether the ratio value is larger than a preset threshold value or not;

and after the overall area of the traversed image block and the area occupied by the suspected character area are obtained, calculating a ratio value of the area occupied by the suspected character area to the overall area, judging whether the ratio value is larger than a preset threshold value or not, and executing different operations based on different judgment results. The preset threshold may be any threshold set in advance by the user.

And c, if the ratio value is larger than a preset threshold value, determining that characters exist in the traversed image block, determining the character direction of the characters in the traversed image block according to the training result, and taking the traversed image block as a target image block.

And when the proportion value is judged to be larger than the preset threshold value, determining that the traversed image block has characters, and if the proportion value is smaller than or equal to the preset threshold value, determining that the traversed image block does not have characters. If the traversed image block has characters, the character direction of the characters in the traversed image block can be determined according to the training result, that is, the character direction of the characters in the traversed image block is determined according to the prediction direction of the convolutional neural network model, and at this time, the traversed image block can also be used as the target image block.

In the embodiment, the proportion value of the area occupied by the suspected character area in the traversed image block to the whole area is determined according to the training result, when the proportion value is larger than the preset threshold value, the characters in the traversed image block are determined, the character sending direction is determined according to the training result, and the traversed image block is used as the target image block, so that the accuracy of the obtained target image block is guaranteed.

Specifically, the step of determining the character direction of the characters in the traversed image block according to the training result includes:

and d, determining a label result corresponding to the traversed image block according to the training result, matching the label result with a preset label direction comparison table, and determining the character direction of characters in the traversed image block according to the matching result.

When determining the character direction of characters in an traversed image block, the direction which can be predicted by a convolutional neural network model needs to be determined first, that is, the label result carried in the traversed image block, such as 0,1,2,3, etc., is determined according to the training result of the convolutional neural network model training, and then the obtained label result is matched with a preset label direction comparison table, wherein the label direction comparison table is provided with the direction corresponding to each label, such as 0 corresponding to 0 degree, 1 corresponding to 90 degrees, 2 corresponding to 180 degrees, 3 corresponding to 270 degrees, etc. And determining the character direction of the characters in the traversed image block according to the matching result, for example, the convolutional neural network model can predict 4 directions, which are respectively represented by 0,1,2, and 3 at 0 degree, 90 degrees, 180 degrees, and 270 degrees, that is, if the tag result output by the convolutional neural network model is 1, the character direction in the image block can be determined to be 90 degrees.

In this embodiment, the label result corresponding to the traversed image block is determined according to the training result, and when the label result is matched with the label comparison table, the character direction is determined according to the matching result, so that the accuracy of the obtained character direction is ensured.

Further, before the step of training each image block based on a preset convolutional neural network model, the method includes:

step e, inputting a plurality of initial image blocks in a preset mapping comparison table into an original convolutional neural network model for training so as to obtain text information of each initial image block, and comparing each text information with label information corresponding to each initial image block in the preset mapping comparison table;

before each image block is trained by using a preset convolutional neural network model, a conventional convolutional neural network model, namely an original convolutional neural network model, needs to be obtained, and the preset convolutional neural network model is obtained by training and optimizing the original convolutional neural network model in advance, for example, by using a gradient descent method until the model converges. That is, the labeling information of each initial image block, such as whether there is text content, and the text direction of the text content, may be determined by manually labeling, and the initial image blocks and the labeling information are summarized to obtain the preset mapping comparison table. And after the preset mapping comparison table is obtained, performing model optimization on the original convolutional neural network model according to the preset mapping comparison table to obtain the convolutional neural network model. The method includes the steps of extracting a plurality of initial image blocks in a preset mapping comparison table, inputting each initial image block into an original convolutional neural network model as a batch for training, determining text information of each initial image block according to a training result, namely determining whether each initial image block contains text content according to the training result, and if so, determining the text direction of the text content in each initial image block. And comparing each text information with the label information corresponding to each initial image block in the preset mapping comparison table, that is, comparing the text information (including whether the text information has text content and text direction) of each initial image block with the label information (including whether the text information has text content and text direction) of each initial image block in the preset mapping comparison table.

And f, if the comparison fails, determining errors of the text information and the labeling information, and optimizing the original convolutional neural network model according to the errors to obtain a preset convolutional neural network model.

If the image blocks are inconsistent (namely the comparison fails), namely if the text information and the label information of a certain initial image block are different, the error of the text information and the label information needs to be determined, and model optimization is performed on the original convolutional neural network model according to the error, namely model parameters are adjusted. And performing model optimization on the original convolutional neural network model again in the same mode until the model is converged or the error is extremely small, and taking the convolutional neural network model at the moment as a preset convolutional neural network model.

In this embodiment, the original convolutional neural network model is trained according to each initial image block, an error of the original convolutional neural network model is determined when the comparison between the text information of each initial image block and the label information in the preset mapping comparison table fails, and the original convolutional neural network model is optimized according to the error to obtain the preset convolutional neural network model, so that the effectiveness of the obtained preset convolutional neural network model is ensured.

Further, the step of classifying and summarizing the character direction of each target image block, and determining the target character direction based on the classified and summarized result includes:

and step g, classifying and summarizing the character directions of the target image blocks to obtain a plurality of initial character directions, determining the number of the target image blocks corresponding to the initial character directions, and taking the initial character direction with the largest number of the target image blocks in each initial character direction as the target character direction.

In this embodiment, after the character directions of the respective target image blocks are obtained, the character directions need to be classified and summarized to obtain a plurality of initial character directions, such as 0 degree, 90 degree, 180 degree, and 270 degree. For example, the target image blocks corresponding to 0 degree are collected, the target image blocks corresponding to 90 degrees are collected, the target image blocks corresponding to 180 degrees are collected, the target image blocks corresponding to 270 degrees are collected, and it is determined which character direction corresponds to the largest number of target image blocks, then the character direction is taken as the target character direction, that is, the number of the target image blocks and the number of the target image blocks (that is, the number of the target image blocks) corresponding to each character side are determined, and the initial character direction with the largest number of the target image blocks in each initial character direction is taken as the target character direction. If the target image block corresponding to 90 degrees is the largest, the 90 degrees can be used as the target character direction, that is, the page direction of the target image.

In this embodiment, a plurality of initial character directions are obtained by classifying and summarizing the character directions of each target image block, and the initial character direction with the largest number of target image blocks in each initial character direction is taken as the target character direction, so that the accuracy of the obtained target character direction is ensured.

Further, the step of dividing the target image according to a preset clipping manner to obtain a plurality of image blocks includes:

and h, determining an origin in the target image, determining the length and the width of the image block to be divided based on the origin and a preset cutting mode, and dividing the target image according to the length and the width to obtain a plurality of image blocks.

In this embodiment, when dividing the target image according to a preset clipping manner, it is necessary to determine an origin point set in the target image, that is, an origin point of coordinates of a coordinate system, where a position of the origin point may be set based on a requirement of a user. After the origin is determined, a two-dimensional coordinate system can be created according to the initial length and the initial width of the target image, the x axis and the y axis are constructed according to the edges of the target image to complete construction of the two-dimensional coordinate system, after the two-dimensional coordinate system is constructed, the length and the width of an image block to be divided and coordinates of a division starting point need to be determined, and then coordinates of four image block vertexes of the image block are determined in the two-dimensional coordinate system according to the length, the width and the coordinates of the starting point, such as [ i ] stride, j _ stride, i _ stride + size, j _ stride + size ]. And dividing the target image block according to the vertex coordinates of the four image blocks to obtain the divided image blocks. The target image may be divided for a plurality of times to obtain a plurality of image blocks, and the obtaining mode of each image block may be the same.

In the embodiment, the length and the width of the image block to be divided are determined according to the origin and the cutting mode in the target image, and the target image is divided based on the length and the width to obtain a plurality of image blocks, so that the effectiveness of the obtained image blocks is guaranteed.

Specifically, the step of determining the length and width of the image block to be divided based on the origin and a preset clipping manner includes:

and k, acquiring the initial length and the initial width of the target image, and determining the length and the width of the image block to be divided based on the origin, the initial length and the initial width, wherein the length is less than or equal to the initial length, and the width is less than or equal to the initial width.

When determining the length and the width of the image block to be divided, the length of the target image, i.e., the initial length, and the width of the target image, i.e., the initial width, are acquired, a range capable of being divided is determined according to the origin, the initial width and the initial length, and the target image is divided according to a dividing instruction of a user to determine the length and the width of the image block to be divided. And each image block needs to satisfy the following conditions that the length of the image block to be divided is less than or equal to the initial length of the target image, the width of the image block to be divided is less than or equal to the initial width of the target image, and an overlapping part exists between adjacent image blocks.

In this embodiment, the length and the width of the image block to be divided are determined according to the initial length, the initial width and the origin of the target image, so that the effectiveness of the acquired image block to be divided is ensured.

In addition, referring to fig. 3, an embodiment of the present invention further provides a page direction identification apparatus, where the page direction identification apparatus includes:

the dividing module A10 is configured to determine a target image to be subjected to image detection, and divide the target image according to a preset clipping manner to obtain a plurality of image blocks;

the determining module A20 is used for training each image block based on a preset convolutional neural network model, and determining whether each image block has characters and target image blocks in the character direction based on the training result of the training;

and a classifying and summarizing module a30, configured to, if a plurality of target image blocks with characters and character directions exist in each image block, classify and summarize the character directions of each target image block, determine a target character direction based on a classifying and summarizing result of the classifying and summarizing, and use the target character direction as a page direction of the target image.

Further, the determining module a20 is further configured to:

Further, the categorical summarizing module a30 is further configured to:

Further, the dividing module a10 is further configured to:

The steps implemented by each functional module of the page direction identification apparatus may refer to each embodiment of the page direction identification method of the present invention, and are not described herein again.

The present invention also provides a page direction identification device, which includes: the device comprises a memory, a processor and a page direction identification program stored on the memory; the processor is used for executing the page direction identification program to realize the following steps:

The present invention also provides a computer-readable storage medium storing one or more programs, which are further executable by one or more processors for implementing the steps of the embodiments of the page direction identifying method described above.

The specific implementation of the computer-readable storage medium of the present invention is substantially the same as the embodiments of the page direction identification method, and is not described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A page direction identification method is characterized by comprising the following steps:

determining a target image to be subjected to image detection, and dividing the target image according to a preset cutting mode to obtain a plurality of image blocks, wherein partial areas of every two adjacent image blocks in each image block are completely the same;

if a plurality of target image blocks with characters and character directions exist in each image block, classifying and summarizing the character directions of each target image block, determining the target character direction based on the classified and summarized result, and taking the target character direction as the page direction of the target image;

wherein, the step of determining whether each image block has characters and target image blocks with character directions based on the training result of the training comprises:

2. The method for identifying page direction according to claim 1, wherein the step of determining the character direction of the characters in the traversed image block according to the training result comprises:

3. The method for identifying page direction according to claim 1, wherein the step of training each image block based on the preset convolutional neural network model comprises:

4. The method for identifying page direction according to claim 1, wherein the step of classifying and summarizing the character direction of each target image block and determining the target character direction based on the classified and summarized result comprises:

5. The page direction recognition method according to any one of claims 1 to 4, wherein the step of dividing the target image according to a preset clipping manner to obtain a plurality of image blocks comprises:

6. The page direction recognition method of claim 5, wherein the step of determining the length and width of the image block to be divided based on the origin and a preset clipping manner comprises:

7. A page direction recognition apparatus, characterized in that the page direction recognition apparatus comprises:

the image detection device comprises a dividing module, a judging module and a judging module, wherein the dividing module is used for determining a target image to be subjected to image detection and dividing the target image according to a preset cutting mode to obtain a plurality of image blocks, and partial areas of every two adjacent image blocks in each image block are completely the same;

the classification and collection module is used for classifying and collecting the character direction of each target image block if a plurality of target image blocks with characters and character directions exist in each image block, determining the target character direction based on the classification and collection result of the classification and collection, and taking the target character direction as the page direction of the target image;

the determination module is further configured to traverse each image block based on the training result of the training, and obtain the whole area of the traversed image block and the area occupied by the suspected character area in the traversed image block; calculating a ratio value of the area occupied by the suspected character area to the whole area, and judging whether the ratio value is larger than a preset threshold value or not; if the ratio value is larger than a preset threshold value, determining that characters exist in the traversed image block, determining the character direction of the characters in the traversed image block according to the training result, and taking the traversed image block as a target image block.

8. A page direction recognition apparatus, characterized in that the page direction recognition apparatus comprises: memory, a processor and a page direction identification program stored on the memory and executable on the processor, the page direction identification program, when executed by the processor, implementing the steps of the page direction identification method according to any one of claims 1 to 6.

9. A computer-readable storage medium, characterized in that a page direction identification program is stored on the computer-readable storage medium, which when executed by a processor implements the steps of the page direction identification method according to any one of claims 1 to 6.