WO2022105569A1 - 页面方向识别方法、装置、设备及计算机可读存储介质 - Google Patents

页面方向识别方法、装置、设备及计算机可读存储介质 Download PDF

Info

Publication number
WO2022105569A1
WO2022105569A1 PCT/CN2021/127179 CN2021127179W WO2022105569A1 WO 2022105569 A1 WO2022105569 A1 WO 2022105569A1 CN 2021127179 W CN2021127179 W CN 2021127179W WO 2022105569 A1 WO2022105569 A1 WO 2022105569A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
image blocks
target image
target
image
Prior art date
Application number
PCT/CN2021/127179
Other languages
English (en)
French (fr)
Inventor
高超
徐国强
Original Assignee
深圳壹账通智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳壹账通智能科技有限公司 filed Critical 深圳壹账通智能科技有限公司
Publication of WO2022105569A1 publication Critical patent/WO2022105569A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • the present application relates to the technical field of image processing, and in particular, to a page orientation identification method, apparatus, device, and computer-readable storage medium.
  • OCR Optical Character Recognition, Optical Character Recognition
  • OCR Optical Character Recognition, Optical Character Recognition
  • the page orientation may be rotated by 90, 180 or 270 degrees, and the direct input into the OCR system often cannot work directly. Perform rotation correction.
  • the traditional page orientation usually uses methods such as morphology, line detection, projection, etc. to estimate the position and direction of the text line to judge the page orientation, but it may turn the picture 180 degrees reversed, and it is easily affected by the background texture outside the page. interference of lines.
  • the use of deep learning technology can directly classify the entire image and predict the direction of the image, the model requires a large amount of training data, is also susceptible to interference from background textures, and is not robust.
  • the main purpose of the present application is to provide a method, apparatus, device and computer-readable storage medium for page orientation identification, which aims to solve the technical problem of how to improve the accuracy of image page orientation identification.
  • a first aspect of the embodiments of the present application provides a method for identifying a page orientation, including:
  • the character directions of each of the target image blocks are classified and summarized, and the target character direction is determined based on the classification result of the classification and summarization, and use the target text direction as the page direction of the target image.
  • a second aspect of the embodiments of the present application provides a device for identifying a page orientation, including:
  • a dividing module configured to determine a target image to be subjected to image detection, and divide the target image according to a preset cropping method to obtain a plurality of image blocks;
  • a determination module configured to perform training on each of the image blocks based on a preset convolutional neural network model, and determine whether each of the image blocks has a target image block with text and a text direction based on the training result of the training;
  • a classification and summarization module configured to classify and summarize the text direction of each of the target image blocks if there are multiple target image blocks with characters and character directions in each of the image blocks, and based on the classification and summarization of the classification and summarization As a result, the target character direction is determined, and the target character direction is used as the page direction of the target image.
  • a third aspect of the embodiments of the present application provides a device for identifying page orientation
  • the page orientation identification device includes: a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein:
  • the computer program implements the following steps when executed by the processor:
  • the character directions of each of the target image blocks are classified and summarized, and the target character direction is determined based on the classification result of the classification and summarization, and use the target text direction as the page direction of the target image.
  • a fourth aspect of the embodiments of the present application provides a computer-readable storage medium
  • a computer program is stored on the computer-readable storage medium, and when the computer program is executed by the processor, the following steps are implemented:
  • the character directions of each of the target image blocks are classified and summarized, and the target character direction is determined based on the classification result of the classification and summarization, and use the target text direction as the page direction of the target image.
  • FIG. 1 is a schematic structural diagram of a page orientation recognition device of a hardware operating environment involved in an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a first embodiment of a method for identifying a page orientation of the present application
  • FIG. 3 is a schematic diagram of functional modules of the page orientation identification device of the present application.
  • FIG. 1 is a schematic structural diagram of a page orientation recognition device of a hardware operating environment involved in the solution of the embodiment of the present application.
  • the page orientation identification device may include: a processor 1001 , such as a CPU, a network interface 1004 , a user interface 1003 , a memory 1005 , and a communication bus 1002 .
  • the communication bus 1002 is used to realize the connection and communication between these components.
  • the user interface 1003 may include a display screen (Display), an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
  • the network interface 1004 may include a standard wired interface and a wireless interface (eg, a WI-FI interface).
  • the memory 1005 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory). memory), such as disk storage.
  • the memory 1005 may also be a storage device independent of the aforementioned processor 1001 .
  • the page orientation identification device may further include a camera, an RF (Radio Frequency, radio frequency) circuit, a sensor, an audio circuit, a WiFi module, and the like.
  • sensors such as light sensors, motion sensors and other sensors.
  • the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display screen according to the brightness of the ambient light.
  • the page orientation recognition device may also be configured with other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, an infrared sensor, etc., which will not be repeated here.
  • the structure of the page orientation recognition device shown in FIG. 1 does not constitute a limitation on the page orientation recognition device, and may include more or less components than those shown in the figure, or combine some components, or different component layout.
  • the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module and a page orientation identification program.
  • the network interface 1004 is mainly used to connect to the background server and perform data communication with the background server;
  • the user interface 1003 is mainly used to connect to the client (client) and perform data communication with the client;
  • the processor 1001 may be configured to call the page orientation identification program stored in the memory 1005, and execute the page orientation identification method provided by the embodiment of the present application.
  • the present application provides a method for identifying a page orientation.
  • the method for identifying a page orientation includes the following steps:
  • Step S10 determining a target image to be subjected to image detection, and dividing the target image according to a preset cropping method to obtain a plurality of image blocks;
  • the target image to be detected on the image document page is cut into several patches (small blocks), that is, image blocks, and each image block is predicted by using the convolutional neural network model in deep learning , determine whether there is text in each image block, and if so, continue to determine the direction of the text to obtain the prediction results of each image block, and then summarize and fuse the prediction results to obtain the direction of the entire document page in the target image.
  • the OCR technology can only recognize the text, and the page orientation needs to be re-corrected, and the accuracy of the image page orientation recognition is higher.
  • the way to determine the target image may be to obtain the image input by the user and use it as the target image, or obtain the image sent by other terminals and use it as the target image , or the image generated by the terminal performing image detection itself may be used as the target image, and the specific method of acquiring the target image is not limited here, and can be set according to the needs of the user.
  • the target image After the target image is obtained, the target image needs to be divided by a preset cropping method to obtain multiple image blocks, and it should be noted that, in order to ensure the continuity of each image block, when dividing the target image, It is necessary to ensure that there is a partial overlap between adjacent image blocks in each image block, that is, a part of the area between two adjacent image blocks is exactly the same, and the preset cropping method can be to first determine the origin of the target image, such as the target image. The upper left corner of the image is the origin, and a two-dimensional coordinate system is constructed based on this origin. The x-axis and the y-axis in the two-dimensional coordinate system can be determined based on the edge length and edge width of the target image.
  • Step S20 performing training on each of the image blocks based on a preset convolutional neural network model, and determining whether each of the image blocks has a target image block with text and a text direction based on the training result of the training;
  • the preset convolutional neural network model can be used to train each image block, that is, each image block is combined into a batch and input to the convolutional neural network model for training, so that the training results can be used according to the training results.
  • the target image blocks with text in each image block and the text direction corresponding to each target image block are determined. That is, each image block is predicted by the convolutional neural network model, and whether each image block contains text can be determined according to the prediction result, and if it contains text, the direction of the text can be determined based on the prediction result.
  • the way to detect whether each image block contains text can be to determine the overall area of each image block through a convolutional neural network, and detect the area occupied by the suspected text area in each image block, and then detect the area occupied by the suspected text area and The proportion of the overall area. If the proportion of the traversed image blocks is greater than the preset threshold (any threshold set by the user in advance), it is determined that there is text in the traversed image blocks. If the proportion is less than or equal to the preset threshold, the traversal is determined. The text does not exist in the image block of . And after determining that there is text in the traversed image block, the direction of the text in the traversed image block can be determined according to the prediction direction of the convolutional neural network model.
  • the preset threshold any threshold set by the user in advance
  • the training result of the cumulative neural network model training determines the label results carried in the traversed image blocks, such as 0, 1, 2, 3, etc., and then matches the obtained label results with the preset label direction comparison table, and the label direction is compared.
  • the direction corresponding to each label is set in the table, for example, 0 corresponds to 0 degrees, 1 corresponds to 90 degrees, 2 corresponds to 180 degrees, and 3 corresponds to 270 degrees. Then determine the text direction of the text in the traversed image block according to the matching result.
  • the convolutional neural network model can predict 4 directions, and use 0, 1, 2, and 3 to represent 0 degrees, 90 degrees, 180 degrees and 270 degrees respectively. , that is, if the label result output by the convolutional neural network model is 1, it can be determined that the text direction in the image block is 90 degrees.
  • Step S30 if there are a plurality of target image blocks with characters and character directions in each of the image blocks, then classify and summarize the character directions of each of the target image blocks, and determine the target based on the classification results of the classification and summarization. text direction, and use the target text direction as the page direction of the target image.
  • the text direction corresponding to each target image block is obtained, and these text directions are classified and summarized, for example, the target image corresponding to 0 degrees Summarize the blocks, summarize the target image blocks corresponding to 90 degrees, summarize the target image blocks corresponding to 180 degrees, and summarize the target image blocks corresponding to 270 degrees, and determine which text direction corresponds to the most target image blocks, then This text direction is used as the target text direction. If the target image block corresponding to 90 degrees is the most, 90 degrees can be used as the target text direction, that is, the page direction of the target image. In this proposal, the target image is divided into multiple image blocks, and then each image block is detected separately to determine whether it contains text, and if so, determine the text direction of the text, and classify and summarize to determine the target image. page orientation.
  • the target image to be detected is determined, and the target image is divided according to a preset cropping method to obtain a plurality of image blocks; based on a preset convolutional neural network model Image blocks are trained, and based on the training results of the training, it is determined whether each of the image blocks has target image blocks with text and text directions; if there are multiple target image blocks with text and text directions in each of the image blocks , the text direction of each target image block is classified and summarized, the target text direction is determined based on the classification result of the classification, and the target text direction is used as the page direction of the target image.
  • each image block is trained according to the convolutional neural network model to determine the target image block and classify the text direction of each target image block. Summarize, determine the page orientation of the target image according to the classification and summary result, thereby avoiding the inaccurate estimation of the page orientation of the target image in the prior art, and improving the accuracy of recognizing the page orientation of the image.
  • step S20 of the first embodiment of the present application is determined based on the training result of the training.
  • the refinement of the steps of whether each of the image blocks has text and target image blocks in the text direction including:
  • Step a traverse each of the image blocks based on the training result of the training, and obtain the overall area of the traversed image blocks and the area occupied by the suspected text area in the traversed image blocks;
  • each image block after training each image block through the convolutional neural network model and obtaining the training result, each image block can be traversed according to the training result, and the overall area and the traversed image of the traversed image block are determined.
  • the area of the suspected text area in the block that is, the area occupied by the suspected text area.
  • Step b calculating the ratio of the area occupied by the suspected text area to the overall area, and judging whether the ratio is greater than a preset threshold
  • the preset threshold may be any threshold set in advance by the user.
  • Step c if the ratio value is greater than a preset threshold, determine that the traversed image block has text, determine the text direction of the text in the traversed image block according to the training result, and use the traversed image block as target image block.
  • the scale value is greater than the preset threshold
  • the traversed image block can also be used as the target image block.
  • the proportion of the area occupied by the suspected text area in the traversed image block and the overall area is determined according to the training result, and when the proportion value is greater than the preset threshold, it is determined that the traversed image block has text, and then according to The training result determines the text sending direction, and uses the traversed image block as the target image block, thereby ensuring the accuracy of the obtained target image block.
  • the step of determining the text direction of the text in the traversed image block according to the training result includes:
  • Step d determine the label result corresponding to the traversed image block according to the training result, match the label result with the preset label direction comparison table, and determine the traversed image block Chinese according to the matching result.
  • the text direction of the word is the direction of the word.
  • the convolutional neural network model When determining the text direction of the text in the traversed image block, it is necessary to first determine the direction that the convolutional neural network model can predict, that is, determine the label result carried in the traversed image block according to the training result of the convolutional neural network model training, such as 0 , 1, 2, 3, etc., and then match the obtained label results with the preset label direction comparison table.
  • the label direction comparison table is set with the directions corresponding to each label, such as 0 corresponds to 0 degrees, 1 corresponds to 90 degrees, 2 corresponds to 180 degrees, 3 corresponds to 270 degrees, etc. Then determine the text direction of the text in the traversed image block according to the matching result.
  • the convolutional neural network model can predict 4 directions, and use 0, 1, 2, and 3 to represent 0 degrees, 90 degrees, 180 degrees and 270 degrees respectively. , that is, if the label result output by the convolutional neural network model is 1, it can be determined that the text direction in the image block is 90 degrees.
  • the label result corresponding to the traversed image block is determined according to the training result, and when the label result matches the label comparison table, the text direction is determined according to the matching result, thereby ensuring the accuracy of the obtained text direction.
  • the steps include:
  • Step e inputting a plurality of initial image blocks in the preset mapping comparison table into the original convolutional neural network model for training to obtain the text information of each of the initial image blocks, and combining each of the text information and the comparing the annotation information corresponding to each of the initial image blocks in the preset mapping comparison table;
  • each initial image block can be determined by manual marking, such as whether it has text content, and the text direction of the text content, etc., and each initial image block and the annotation information can be summarized to obtain a preset mapping comparison table. .
  • model optimization can be performed on the original convolutional neural network model according to the preset mapping comparison table to obtain a convolutional neural network model. That is, multiple initial image blocks in the preset mapping comparison table can be extracted, and each initial image block can be input into the original convolutional neural network model as a batch for training, and the text of each initial image block can be determined according to the training results. information, that is, it is determined whether each initial image block contains text content according to the training result, and if so, the text direction of the text content in each initial image block is determined.
  • Step f if the comparison fails, determine the error of each of the text information and each of the labeled information, and optimize the original convolutional neural network model according to the error to obtain a preset convolutional neural network. network model.
  • the original convolutional neural network model is trained according to each initial image block, and when the text information of each initial image block fails to be compared with the annotation information in the preset mapping comparison table, the error is determined. , and optimize the original convolutional neural network model according to the error to obtain a preset convolutional neural network model, thereby ensuring the validity of the obtained preset convolutional neural network model.
  • the steps of classifying and summarizing the text direction of each of the target image blocks, and determining the target text direction based on the classification and summarizing results of the classification and summarization include:
  • step g the text directions of each of the target image blocks are classified and summarized to obtain a plurality of initial text directions, and the number of target image blocks corresponding to each of the initial text directions is determined, and the target image blocks corresponding to each of the initial text directions are determined.
  • the initial text direction with the largest number of image blocks is used as the target text direction.
  • these text orientations need to be classified and summarized to acquire multiple initial text orientations, such as 0 degrees, 90 degrees, 180 degrees, and 270 degrees.
  • the target image blocks corresponding to 0 degrees are summarized
  • the target image blocks corresponding to 90 degrees are summarized
  • the target image blocks corresponding to 180 degrees are summarized
  • the target image blocks corresponding to 270 degrees are summarized, and which text direction is determined.
  • the corresponding target image blocks are the most, then this text direction is used as the target text direction, that is, the number of target image blocks and target image blocks corresponding to each text square (that is, the number of target image blocks) is determined, and the target image block is set in each initial text direction.
  • the initial text direction with the largest number of image blocks is used as the target text direction. If the target image blocks corresponding to 90 degrees are the most, 90 degrees can be used as the target text direction, that is, the page direction of the target image.
  • steps of dividing the target image according to a preset cropping method to obtain multiple image blocks include:
  • Step h determine the origin in the target image, determine the length and width of the image block to be divided based on the origin and a preset cropping method, and divide the target image according to the length and the width to Get multiple image blocks.
  • the origin set in the target image that is, the coordinate origin of the constructed coordinate system, wherein the position of the origin can be determined by the user's needs
  • the origin of the target image preferably one of the four vertex positions of the target image is used as the origin of the target image, for example, the upper left corner of the target image is set as the origin.
  • a two-dimensional coordinate system can be created according to the initial length and initial width of the target image, and the x-axis and y-axis can be constructed with the edges of the target image to complete the construction of the two-dimensional coordinate system, and construct the two-dimensional coordinate system.
  • the length and width of the image block to be divided After completion, it is necessary to determine the length and width of the image block to be divided, as well as the starting point coordinates of the division, and then determine the four image block vertex coordinates of the image block in the two-dimensional coordinate system according to the length, width and starting point coordinates of the image block, such as [ i*stride, j*stride, i*stride + size, j*stride + size]. And divide the target image block according to the vertex coordinates of the four image blocks to obtain the divided image blocks.
  • the target image can be divided multiple times to obtain a plurality of image blocks, and each image block can be obtained in the same way.
  • the length and width of the image block to be divided are determined according to the origin and the cropping method in the target image, and the target image is divided based on the length and width to obtain multiple image blocks, thereby ensuring the obtained image. block validity.
  • the steps of determining the length and width of the image block to be divided based on the origin and a preset cropping method include:
  • Step k obtaining the initial length and initial width of the target image, and determining the length and width of the image block to be divided based on the origin, the initial length and the initial width, wherein the length is less than or equal to the The initial length, the width is less than or equal to the initial width.
  • the division instruction divides the target image to determine the length and width of the image block to be divided.
  • each image block needs to meet the following conditions, that is, the length of the image block to be divided is less than or equal to the initial length of the target image, the width of the image block to be divided is less than or equal to the initial width of the target image, the adjacent image blocks There is overlap between them.
  • the length and width of the image block to be divided are determined according to the initial length, initial width and origin of the target image, thereby ensuring the validity of the obtained image block to be divided.
  • an embodiment of the present application further proposes a page orientation identification device, where the page orientation identification device includes:
  • a division module A10 configured to determine a target image to be subjected to image detection, and divide the target image according to a preset cropping method to obtain a plurality of image blocks;
  • a determination module A20 configured to perform training on each of the image blocks based on a preset convolutional neural network model, and determine whether each of the image blocks has a target image block with text and a text direction based on the training result of the training;
  • the classification and summarization module A30 is configured to classify and summarize the text direction of each of the target image blocks if there are multiple target image blocks with characters and character directions in each of the image blocks, and the classification based on the classification and summarization The aggregated results determine the target text direction, and use the target text direction as the page direction of the target image.
  • determining module A20 is also used for:
  • the ratio value is greater than the preset threshold, it is determined that the traversed image block has text, the text direction of the text in the traversed image block is determined according to the training result, and the traversed image block is used as the target image block .
  • determining module A20 is also used for:
  • determining module A20 is also used for:
  • the comparison fails, the error of each of the text information and each of the label information is determined, and the original convolutional neural network model is optimized according to the error to obtain a preset convolutional neural network model.
  • classification and summary module A30 is also used for:
  • the text direction of each of the target image blocks is classified and summarized to obtain a plurality of initial text directions, and the number of target image blocks corresponding to each of the initial text directions is determined. The most initial text direction is used as the target text direction.
  • dividing module A10 is also used for:
  • Determine the origin in the target image determine the length and width of the image block to be divided based on the origin and a preset cropping method, and divide the target image according to the length and the width to obtain multiple image block.
  • dividing module A10 is also used for:
  • the target image obtaining the initial length and initial width of the target image, and determining the length and width of the image block to be divided based on the origin, the initial length and the initial width, wherein the length is less than or equal to the initial length, The width is less than or equal to the initial width.
  • the present application also provides a device for identifying page orientation, which includes: a memory, a processor, and a page orientation identifying program stored on the memory; the processor is configured to execute the page orientation identifying program to Implement the following steps:
  • the character directions of each of the target image blocks are classified and summarized, and the target character direction is determined based on the classification result of the classification and summarization, and use the target text direction as the page direction of the target image.
  • the present application also provides a computer-readable storage medium, where one or more programs are stored in the computer-readable storage medium, and the one or more programs can also be executed by one or more processors for implementing The steps of each embodiment of the above-mentioned page orientation identification method.
  • the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation.
  • the technical solutions of the present application can be embodied in the form of software products in essence or the parts that make contributions to the prior art.
  • the computer software products are stored in a storage medium (such as ROM/RAM) as described above. , magnetic disk, optical disc), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the methods described in the various embodiments of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

本申请涉及图像处理技术领域,公开了一种页面方向识别方法、装置、设备及计算机可读存储介质,该方法包括:确定待进行图像检测的目标图像,根据预设的裁剪方式对所述目标图像进行划分,以获取多个图像块;基于预设的卷积神经网络模型对各所述图像块进行训练,基于所述训练的训练结果确定各所述图像块中是否具有文字和文字方向的目标图像块;若在各所述图像块中存在多个具有文字和文字方向的目标图像块,则对各所述目标图像块的文字方向进行分类汇总,基于所述分类汇总的分类汇总结果确定目标文字方向,并将所述目标文字方向作为所述目标图像的页面方向。本申请提高了对图像页面方向识别的准确性。

Description

页面方向识别方法、装置、设备及计算机可读存储介质
本申请要求于2020年11月17日在中国专利局提交的、申请号为202011282095.2、申请名称为“页面方向识别方法、装置、设备及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像处理技术领域,尤其涉及一种页面方向识别方法、装置、设备及计算机可读存储介质。
背景技术
OCR(Optical Character Recognition,光学字符识别)技术能够将图像中印刷文字转换为计算机可处理的文本格式,被广泛应用在数据录入、校验比对等场景中,成为国民经济各行业信息化和数字化应用的关键环节。OCR主要解决图片中文字的位置检测和内容识别两大问题。而待识别图片由于采集方式的不同(例如拍照、扫描),可能会造成页面方向发生90、180或270度方向的旋转,直接输入到OCR系统中往往不能直接工作,通常需要检测图片页面方向并进行旋转校正。传统的页面方向通常是利用形态学、线检测、投影等方法,估计出文本行的位置和方向,进行页面方向的判断,但是有可能会将图片180度转反,另外易受页面外背景纹理线条的干扰。虽然使用深度学习技术,可以对整图直接分类,预测出该图片的方向,但是模型对训练数据量要求较大,同样易受背景纹理的干扰,鲁棒性不高。
技术问题
本申请的主要目的在于提供一种页面方向识别方法、装置、设备及计算机可读存储介质,旨在解决如何提高对图像页面方向识别的准确性的技术问题。
技术解决方案
为解决上述技术问题,本申请实施例采用的技术方案是:
本申请实施例的第一方面提供了一种页面方向识别方法,其中,包括:
确定待进行图像检测的目标图像,根据预设的裁剪方式对所述目标图像进行划分,以获取多个图像块;
基于预设的卷积神经网络模型对各所述图像块进行训练,基于所述训练的训练结果确定各所述图像块中是否具有文字和文字方向的目标图像块;
若在各所述图像块中存在多个具有文字和文字方向的目标图像块,则对各所述目标图像块的文字方向进行分类汇总,基于所述分类汇总的分类汇总结果确定目标文字方向,并将所述目标文字方向作为所述目标图像的页面方向。
本申请实施例的第二方面提供了一种页面方向识别装置,其中,包括:
划分模块,用于确定待进行图像检测的目标图像,根据预设的裁剪方式对所述目标图像进行划分,以获取多个图像块;
确定模块,用于基于预设的卷积神经网络模型对各所述图像块进行训练,基于所述训练的训练结果确定各所述图像块中是否具有文字和文字方向的目标图像块;
分类汇总模块,用于若在各所述图像块中存在多个具有文字和文字方向的目标图像块,则对各所述目标图像块的文字方向进行分类汇总,基于所述分类汇总的分类汇总结果确定目标文字方向,并将所述目标文字方向作为所述目标图像的页面方向。
本申请实施例的第三方面提供了一种页面方向识别设备;
所述页面方向识别设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其中:
所述计算机程序被所述处理器执行时实现如下步骤:
确定待进行图像检测的目标图像,根据预设的裁剪方式对所述目标图像进行划分,以获取多个图像块;
基于预设的卷积神经网络模型对各所述图像块进行训练,基于所述训练的训练结果确定各所述图像块中是否具有文字和文字方向的目标图像块;
若在各所述图像块中存在多个具有文字和文字方向的目标图像块,则对各所述目标图像块的文字方向进行分类汇总,基于所述分类汇总的分类汇总结果确定目标文字方向,并将所述目标文字方向作为所述目标图像的页面方向。
本申请实施例的第四方面提供了一种计算机可读存储介质;
所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如下步骤:
确定待进行图像检测的目标图像,根据预设的裁剪方式对所述目标图像进行划分,以获取多个图像块;
基于预设的卷积神经网络模型对各所述图像块进行训练,基于所述训练的训练结果确定各所述图像块中是否具有文字和文字方向的目标图像块;
若在各所述图像块中存在多个具有文字和文字方向的目标图像块,则对各所述目标图像块的文字方向进行分类汇总,基于所述分类汇总的分类汇总结果确定目标文字方向,并将所述目标文字方向作为所述目标图像的页面方向。
有益效果
本申请的有益效果在于:
避免了现有技术中估计目标图像的页面方向不准确的现象发生,提高了对图像页面方向识别的准确性。
附图说明
图1是本申请实施例方案涉及的硬件运行环境的页面方向识别设备结构示意图;
图2为本申请页面方向识别方法第一实施例的流程示意图;
图3为本申请页面方向识别装置的功能模块示意图。
本申请目的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
本发明的实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
如图1所示,图1是本申请实施例方案涉及的硬件运行环境的页面方向识别设备结构示意图。
如图1所示,该页面方向识别设备可以包括:处理器1001,例如CPU,网络接口1004,用户接口1003,存储器1005,通信总线1002。其中,通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏(Display)、输入单元比如键盘(Keyboard),可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器,也可以是稳定的存储器(non-volatile memory),例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储装置。
可选地,页面方向识别设备还可以包括摄像头、RF(Radio Frequency,射频)电路,传感器、音频电路、WiFi模块等等。其中,传感器比如光传感器、运动传感器以及其他传感器。具体地,光传感器可包括环境光传感器及接近传感器,其中,环境光传感器可根据环境光线的明暗来调节显示屏的亮度。当然,页面方向识别设备还可配置陀螺仪、气压计、湿度计、温度计、红外线传感器等其他传感器,在此不再赘述。
本领域技术人员可以理解,图1中示出的页面方向识别设备结构并不构成对页面方向识别设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
如图1所示,作为一种计算机存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及页面方向识别程序。
在图1所示的页面方向识别设备中,网络接口1004主要用于连接后台服务器,与后台服务器进行数据通信;用户接口1003主要用于连接客户端(用户端),与客户端进行数据通信;而处理器1001可以用于调用存储器1005中存储的页面方向识别程序,并执行本申请实施例提供的页面方向识别方法。
参照图2,本申请提供一种页面方向识别方法,在页面方向识别方法一实施例中,页面方向识别方法包括以下步骤:
步骤S10,确定待进行图像检测的目标图像,根据预设的裁剪方式对所述目标图像进行划分,以获取多个图像块;
在本实施例中,通过将待进行图像文档页面检测的目标图像切割成若干个patch(小块),即图像块,并通过采用深度学习中的卷积神经网络模型对每个图像块进行预测,确定各个图像块中是否存在文字,若存在,则继续确定文字的方向,以获取各个图像块的预测结果,再对各个预测结果进行汇总融合,从而得到目标图像中整个文档页面的方向,相对于现有技术中采用OCR技术只能识别文字,页面方向需要重新校正,其图像页面方向识别的准确性更高。因此,需要先确定待进行图像检测的目标图像,而确定目标图像的方式可以是获取用户输入的图像,并将其作为目标图像,也可以是获取其它终端发送的图像,并将其作为目标图像,还可以是将进行图像检测的终端自身产生的图像作为目标图像,具体获取目标图像的方式在此不做限制,可以根据用户的需求进行设置。
在获取到目标图像后,需要采用预设的裁剪方式对目标图像进行划分,以获取多个图像块,而且需要说明的是,为了保障各个图像块的连贯性,在对目标图像进行划分时,需要确保各个图像块中相邻图像块中间存在有一部分重叠,即两个相邻图像块之间存在有一部分区域完全相同,而预设的裁剪方式可以是先确定目标图像的原点,如以目标图像的左上角为原点,并基于此原点构建二维坐标系,其二维坐标系中的x轴和y轴可以是基于目标图像的边缘长度和边缘宽度确定。再对目标图像进行划分,如划分出目标图像中所有[i*stride, j*stride, i*stride + size, j*stride + size]的区域(这4个数分别表示patch左上和右下的x、y坐标),其中,其中i、j为正整数,并且保证i*stride + size <= width(宽度), j*stride + size <= height(高度)。而在本实施例中,优选地,可以设置stride=192,size=256。
步骤S20,基于预设的卷积神经网络模型对各所述图像块进行训练,基于所述训练的训练结果确定各所述图像块中是否具有文字和文字方向的目标图像块;
当获取到多个图像块后,可以采用预设的卷积神经网络模型对各个图像块进行训练,即将各个图像块组合为一个批次,输入至卷积神经网络模型进行训练,以便根据训练结果确定各个图像块中具有文字的目标图像块和各个目标图像块对应的文字方向。也就是通过卷积神经网络模型对各个图像块进行预测,可以根据预测结果确定各个图像块中是否包含文字,若包含文字,则基于预测结果确定文字的方向。而检测各个图像块中是否包含文字的方式可以是通过卷积神经网络确定每个图像块的整体面积,并检测各个图像块中的疑似文字区域所占面积,再检测疑似文字区域所占面积和整体面积的比例,若存在遍历的图像块的比例大于预设阈值(用户提前设置的任意阈值),则确定该遍历的图像块中存在文字,若比例小于或等于预设阈值,则确定该遍历的图像块中不存在文字。并在确定遍历的图像块中存在文字后,可以根据卷积神经网络模型的预测方向来确定遍历的图像块中文字的方向,因此需要先确定卷积神经网络模型可以预测的方向,即根据卷积神经网络模型训练的训练结果确定遍历的图像块中携带的标签结果,如0,1,2,3等,再将获取的标签结果和预设的标签方向对照表进行匹配,该标签方向对照表中设置有各个标签对应的方向,如0对应0度,1对应90度,2对应180度,3对应270度等。再根据匹配结果来确定遍历的图像块中文字的文字方向,如卷积神经网络模型可以预测4个方向,分别用0,1,2,3来表示0度,90度,180度和270度,也就是若卷积神经网络模型输出的标签结果为1,则可以确定该图像块中文字方向为90度。
步骤S30,若在各所述图像块中存在多个具有文字和文字方向的目标图像块,则对各所述目标图像块的文字方向进行分类汇总,基于所述分类汇总的分类汇总结果确定目标文字方向,并将所述目标文字方向作为所述目标图像的页面方向。
当经过判断发现在各个图像块中存在多个具有文字和文字方向的目标图像块,则获取各个目标图像块对应的文字方向,并对这些文字方向进行分类汇总,比如将0度对应的目标图像块进行汇总,将90度对应的目标图像块进行汇总,将180度对应的目标图像块进行汇总,将270度对应的目标图像块进行汇总,并确定哪个文字方向对应的目标图像块最多,则将此文字方向作为目标文字方向,如将90度对应的目标图像块最多,则可以将90度作为目标文字方向,也就是目标图像的页面方向。在本提案中,通过将目标图像划分为多个图像块,再分别对各个图像块进行检测,确定是否包含文字,若包含,则确定该文字的文字方向,并进行分类汇总,以确定目标图像的页面方向。
在本实施例中,通过确定待进行图像检测的目标图像,根据预设的裁剪方式对所述目标图像进行划分,以获取多个图像块;基于预设的卷积神经网络模型对各所述图像块进行训练,基于所述训练的训练结果确定各所述图像块中是否具有文字和文字方向的目标图像块;若在各所述图像块中存在多个具有文字和文字方向的目标图像块,则对各所述目标图像块的文字方向进行分类汇总,基于所述分类汇总的分类汇总结果确定目标文字方向,并将所述目标文字方向作为所述目标图像的页面方向。通过根据预设的裁剪方式对目标图像进行划分,得到多个图像块,并根据卷积神经网络模型对各个图像块进行训练,以确定目标图像块,并对各个目标图像块的文字方向进行分类汇总,根据分类汇总结果确定目标图像的页面方向,从而避免了现有技术中估计目标图像的页面方向不准确的现象发生,提高了对图像页面方向识别的准确性。
进一步地,在本申请第一实施例的基础上,提出了本申请页面方向识别方法的第二实施例,本实施例是本申请第一实施例的步骤S20,基于所述训练的训练结果确定各所述图像块中是否具有文字和文字方向的目标图像块的步骤的细化,包括:
步骤a,基于所述训练的训练结果遍历各所述图像块,获取遍历的图像块的整体面积和所述遍历的图像块中疑似文字区域所占面积;
在本实施例中,当经过卷积神经网络模型对各个图像块进行训练,获取到训练结果后,可以根据训练结果来遍历各个图像块,对遍历的图像块,确定其整体面积和遍历的图像块中疑似文字区域的面积,即疑似文字区域所占面积。
步骤b,计算所述疑似文字区域所占面积和整体面积的比例值,并判断所述比例值是否大于预设阈值;
当获取到遍历的图像块的整体面积和疑似文字区域所占面积后,计算疑似文字区域所占面和整体面积的比例值,并判断比例值是否大于预设阈值,基于不同的判断结果执行不同的操作。其中,预设阈值可以是用户提前设置的任意阈值。
步骤c,若所述比例值大于预设阈值,则确定遍历的图像块中具有文字,根据所述训练结果确定所述遍历的图像块中文字的文字方向,并将所述遍历的图像块作为目标图像块。
当经过判断发现比例值大于预设阈值,则可以确定遍历的图像块中具有文字,若比例值小于或等于预设阈值,则确定遍历的图像块中不具有文字。若遍历的图像块中具有文字,则可以根据训练结果来确定遍历的图像块中文字的文字方向,也就是根据卷积神经网络模型的预测方向来确定遍历的图像块中文字的文字方向,此时也可以将遍历的图像块作为目标图像块。
在本实例例中,通过根据训练结果确定遍历的图像块中疑似文字区域所占面积和整体面积的比例值,并在比例值大于预设阈值时,确定遍历的图像块中具有文字,再根据训练结果确定文字发方向,并将遍历的图像块作为目标图像块,从而保障了获取的目标图像块的准确性。
具体地,根据所述训练结果确定所述遍历的图像块中文字的文字方向的步骤,包括:
步骤d,根据所述训练结果确定所述遍历的图像块对应的标签结果,并将所述标签结果和预设的标签方向对照表进行匹配,根据所述匹配结果确定所述遍历的图像块中文字的文字方向。
在确定遍历的图像块中文字的文字方向时,需要先确定卷积神经网络模型可以预测的方向,即根据卷积神经网络模型训练的训练结果确定遍历的图像块中携带的标签结果,如0,1,2,3等,再将获取的标签结果和预设的标签方向对照表进行匹配,该标签方向对照表中设置有各个标签对应的方向,如0对应0度,1对应90度,2对应180度,3对应270度等。再根据匹配结果来确定遍历的图像块中文字的文字方向,如卷积神经网络模型可以预测4个方向,分别用0,1,2,3来表示0度,90度,180度和270度,也就是若卷积神经网络模型输出的标签结果为1,则可以确定该图像块中文字方向为90度。
在本实施例中,通过根据训练结果确定遍历的图像块对应的标签结果,并在标签结果和标签对照表匹配时,根据匹配结果确定文字方向,从而保障了获取到的文字方向的准确性。
进一步地,基于预设的卷积神经网络模型对各所述图像块进行训练的步骤之前,包括:
步骤e,将预设映射对照表中的多个初始图像块输入至原始的卷积神经网络模型进行训练,以获取各所述初始图像块的文本信息,并将各所述文本信息和所述预设映射对照表中各所述初始图像块对应的标注信息进行比对;
在采用预设的卷积神经网络模型对各个图像块进行训练前,需要先获取常规的卷积神经网络模型,即原始的卷积神经网络模型,并通过提前对原始的卷积神经网络模型进行训练优化,如采用梯度下降法,直至模型收敛,来获取预设的卷积神经网络模型。也就是可以先通过人工标记的方式确定各个初始图像块的标注信息,如是否具有文本内容,以及文本内容的文本方向等,并将各个初始图像块及标注信息进行汇总,得到预设映射对照表。并在获取到预设映射对照表后,可以根据此预设映射对照表对原始的卷积神经网络模型进行模型优化,得到卷积神经网络模型。即可以提取预设映射对照表中的多个初始图像块,并将各个初始图像块作为一个批次输入至原始的卷积神经网络模型中进行训练,并根据训练结果确定各个初始图像块的文本信息,即根据训练结果确定各个初始图像块中是否包含文本内容,若包含,则确定各个初始图像块中文本内容的文本方向。并将各个文本信息和预设映射对照表中各个初始图像块对应的标注信息进行对比,也就是将每个初始图像块的文本信息(包括是否具有文本内容,文本方向)和各个初始图像块在预设映射对照表中的标注信息(包括是否具有文本内容,文本方向)进行比对。
步骤f,若比对失败,则确定各所述文本信息和各所述标注信息的误差,并根据所述误差对所述原始的卷积神经网络模型进行优化,以获取预设的卷积神经网络模型。
若存在不一致(即比对失败),即若存在某个初始图像块的文本信息和标注信息不同,即需要确定文本信息和标注信息的误差,并根据误差对原始的卷积神经网络模型进行模型优化,即对模型参数进行调整。并再次采用相同的方式对原始的卷积神经网络模型进行模型优化,直至模型收敛或误差极小,并将此时的卷积神经网络模型作为预设的卷积神经网络模型。
在本实施例中,通过根据各个初始图像块对原始的卷积神经网络模型进行训练,并在各个初始图像块的文本信息和预设映射对照表中的标注信息比对失败时,确定其误差,根据误差对原始的卷积神经网络模型进行优化,以获取预设的卷积神经网络模型,从而保障了获取到的预设的卷积神经网络模型的有效性。
进一步地,对各所述目标图像块的文字方向进行分类汇总,基于所述分类汇总的分类汇总结果确定目标文字方向的步骤,包括:
步骤g,对各所述目标图像块的文字方向进行分类汇总,以获取多个初始文字方向,并确定各所述初始文字方向对应的目标图像块数量,在各所述初始文字方向中将目标图像块数量最多的初始文字方向作为目标文字方向。
在本实施例中,当获取到各个目标图像块的文字方向后,需要对这些文字方向进行分类汇总,以获取多个初始文字方向,如0度,90度,180度和270度等。比如将0度对应的目标图像块进行汇总,将90度对应的目标图像块进行汇总,将180度对应的目标图像块进行汇总,将270度对应的目标图像块进行汇总,并确定哪个文字方向对应的目标图像块最多,则将此文字方向作为目标文字方向,即确定各个文字方对应的目标图像块和目标图像块的数量(即目标图像块数量),并在各个初始文字方向中将目标图像块数量最多的初始文字方向作为目标文字方向。如将90度对应的目标图像块最多,则可以将90度作为目标文字方向,也就是目标图像的页面方向。
在本实施例中,通过对各个目标图像块的文字方向进行分类汇总,得到多个初始文字方向,并在各个初始文字方向中将目标图像块数量最多的初始文字方向作为目标文字方向,从而保障了获取到的目标文字方向的准确性。
进一步地,根据预设的裁剪方式对所述目标图像进行划分,以获取多个图像块的步骤,包括:
步骤h,确定所述目标图像中的原点,基于所述原点和预设的裁剪方式确定待划分的图像块的长度和宽度,根据所述长度和所述宽度对所述目标图像进行划分,以获取多个图像块。
在本实施例中,在根据预设的裁剪方式对目标图像进行划分时,需要先确定设置在目标图像中的原点,即构建坐标系的坐标原点,其中,原点的位置可以基于用户的需求自行进行设置,在本提案中,优选以目标图像的四个顶点位置中的某一个作为目标图像的原点,如设置目标图像的左上角为原点。并在确定原点后,可以根据目标图像的初始长度和初始宽度创建二维坐标系,以目标图像的边构建x轴和y轴,以完成二维坐标系的构建,并在二维坐标系构建完成后,需要确定待划分的图像块的长度和宽度,以及划分起点坐标,再根据图像块的长度、宽度和起点坐标在二维坐标系中确定图像块的四个图像块顶点坐标,如[i*stride, j*stride, i*stride + size, j*stride + size]。并根据这四个图像块顶点坐标对目标图像块进行划分,以获取划分后的图像块。可以对目标图像进行多次划分,以获取多个图像块,每个图像块的获取方式均可以采用相同的方式进行获取。
在本实施例中,通过根据目标图像中的原点和裁剪方式确定待划分图像块的长度和宽度,基于长度和宽度对目标图像进行划分,以获取多个图像块,从而保障了获取到的图像块的有效性。
具体地,基于所述原点和预设的裁剪方式确定待划分的图像块的长度和宽度的步骤,包括:
步骤k,获取所述目标图像的初始长度和初始宽度,基于所述原点、所述初始长度和所述初始宽度确定待划分的图像块的长度和宽度,其中,所述长度小于或等于所述初始长度,所述宽度小于或等于所述初始宽度。
在确定待划分的图像块的长度和宽度时,需要先获取目标图像的长度即初始长度,目标图像的宽度即初始宽度,并根据原点,初始宽度和初始长度确定可以划分的范围,并根据用户的划分指令对目标图像进行划分,以确定待划分的图像块的的长度和宽度。而且每个图像块都需要满足以下几个条件,即待划分的图像块的长度小于或等于目标图像的初始长度,待划分的图像块的宽度小于或等于目标图像的初始宽度,相邻图像块之间存在重叠部分。
在本实施例中,通过根据目标图像的初始长度,初始宽度和原点确定待划分图像块的长度和宽度,从而保障了获取到的待划分的图像块的有效性。
此外,参照图3,本申请实施例还提出一种页面方向识别装置,所述页面方向识别装置包括:
划分模块A10,用于确定待进行图像检测的目标图像,根据预设的裁剪方式对所述目标图像进行划分,以获取多个图像块;
确定模块A20,用于基于预设的卷积神经网络模型对各所述图像块进行训练,基于所述训练的训练结果确定各所述图像块中是否具有文字和文字方向的目标图像块;
分类汇总模块A30,用于若在各所述图像块中存在多个具有文字和文字方向的目标图像块,则对各所述目标图像块的文字方向进行分类汇总,基于所述分类汇总的分类汇总结果确定目标文字方向,并将所述目标文字方向作为所述目标图像的页面方向。
进一步地,所述确定模块A20,还用于:
基于所述训练的训练结果遍历各所述图像块,获取遍历的图像块的整体面积和所述遍历的图像块中疑似文字区域所占面积;
计算所述疑似文字区域所占面积和整体面积的比例值,并判断所述比例值是否大于预设阈值;
若所述比例值大于预设阈值,则确定遍历的图像块中具有文字,根据所述训练结果确定所述遍历的图像块中文字的文字方向,并将所述遍历的图像块作为目标图像块。
进一步地,所述确定模块A20,还用于:
根据所述训练结果确定所述遍历的图像块对应的标签结果,并将所述标签结果和预设的标签方向对照表进行匹配,根据所述匹配结果确定所述遍历的图像块中文字的文字方向。
进一步地,所述确定模块A20,还用于:
将预设映射对照表中的多个初始图像块输入至原始的卷积神经网络模型进行训练,以获取各所述初始图像块的文本信息,并将各所述文本信息和所述预设映射对照表中各所述初始图像块对应的标注信息进行比对;
若比对失败,则确定各所述文本信息和各所述标注信息的误差,并根据所述误差对所述原始的卷积神经网络模型进行优化,以获取预设的卷积神经网络模型。
进一步地,所述分类汇总模块A30,还用于:
对各所述目标图像块的文字方向进行分类汇总,以获取多个初始文字方向,并确定各所述初始文字方向对应的目标图像块数量,在各所述初始文字方向中将目标图像块数量最多的初始文字方向作为目标文字方向。
进一步地,所述划分模块A10,还用于:
确定所述目标图像中的原点,基于所述原点和预设的裁剪方式确定待划分的图像块的长度和宽度,根据所述长度和所述宽度对所述目标图像进行划分,以获取多个图像块。
进一步地,所述划分模块A10,还用于:
获取所述目标图像的初始长度和初始宽度,基于所述原点、所述初始长度和所述初始宽度确定待划分的图像块的长度和宽度,其中,所述长度小于或等于所述初始长度,所述宽度小于或等于所述初始宽度。
其中,页面方向识别装置的各个功能模块实现的步骤可参照本申请页面方向识别方法的各个实施例,此处不再赘述。
本申请还提供一种页面方向识别设备,所述页面方向识别设备包括:存储器、处理器及存储在所述存储器上的页面方向识别程序;所述处理器用于执行所述页面方向识别程序,以实现以下步骤:
确定待进行图像检测的目标图像,根据预设的裁剪方式对所述目标图像进行划分,以获取多个图像块;
基于预设的卷积神经网络模型对各所述图像块进行训练,基于所述训练的训练结果确定各所述图像块中是否具有文字和文字方向的目标图像块;
若在各所述图像块中存在多个具有文字和文字方向的目标图像块,则对各所述目标图像块的文字方向进行分类汇总,基于所述分类汇总的分类汇总结果确定目标文字方向,并将所述目标文字方向作为所述目标图像的页面方向。
本申请还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有一个或者一个以上程序,所述一个或者一个以上程序还可被一个或者一个以上的处理器执行以用于实现上述页面方向识别方法各实施例的步骤。
本申请计算机可读存储介质具体实施方式与上述页面方向识别方法各实施例基本相同,在此不再赘述。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者系统不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者系统所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者系统中还存在另外的相同要素。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种页面方向识别方法,其特征在于,所述页面方向识别方法包括以下步骤:
    确定待进行图像检测的目标图像,根据预设的裁剪方式对所述目标图像进行划分,以获取多个图像块;
    基于预设的卷积神经网络模型对各所述图像块进行训练,基于所述训练的训练结果确定各所述图像块中是否具有文字和文字方向的目标图像块;
    若在各所述图像块中存在多个具有文字和文字方向的目标图像块,则对各所述目标图像块的文字方向进行分类汇总,基于所述分类汇总的分类汇总结果确定目标文字方向,并将所述目标文字方向作为所述目标图像的页面方向。
  2. 如权利要求1所述的页面方向识别方法,其特征在于,所述基于所述训练的训练结果确定各所述图像块中是否具有文字和文字方向的目标图像块的步骤,包括:
    基于所述训练的训练结果遍历各所述图像块,获取遍历的图像块的整体面积和所述遍历的图像块中疑似文字区域所占面积;
    计算所述疑似文字区域所占面积和整体面积的比例值,并判断所述比例值是否大于预设阈值;
    若所述比例值大于预设阈值,则确定遍历的图像块中具有文字,根据所述训练结果确定所述遍历的图像块中文字的文字方向,并将所述遍历的图像块作为目标图像块。
  3. 如权利要求2所述的页面方向识别方法,其特征在于,所述根据所述训练结果确定所述遍历的图像块中文字的文字方向的步骤,包括:
    根据所述训练结果确定所述遍历的图像块对应的标签结果,并将所述标签结果和预设的标签方向对照表进行匹配,根据所述匹配结果确定所述遍历的图像块中文字的文字方向。
  4. 如权利要求1所述的页面方向识别方法,其特征在于,所述基于预设的卷积神经网络模型对各所述图像块进行训练的步骤之前,包括:
    将预设映射对照表中的多个初始图像块输入至原始的卷积神经网络模型进行训练,以获取各所述初始图像块的文本信息,并将各所述文本信息和所述预设映射对照表中各所述初始图像块对应的标注信息进行比对;
    若比对失败,则确定各所述文本信息和各所述标注信息的误差,并根据所述误差对所述原始的卷积神经网络模型进行优化,以获取预设的卷积神经网络模型。
  5. 如权利要求1所述的页面方向识别方法,其特征在于,所述对各所述目标图像块的文字方向进行分类汇总,基于所述分类汇总的分类汇总结果确定目标文字方向的步骤,包括:
    对各所述目标图像块的文字方向进行分类汇总,以获取多个初始文字方向,并确定各所述初始文字方向对应的目标图像块数量,在各所述初始文字方向中将目标图像块数量最多的初始文字方向作为目标文字方向。
  6. 如权利要求1-5任一项所述的页面方向识别方法,其特征在于,所述根据预设的裁剪方式对所述目标图像进行划分,以获取多个图像块的步骤,包括:
    确定所述目标图像中的原点,基于所述原点和预设的裁剪方式确定待划分的图像块的长度和宽度,根据所述长度和所述宽度对所述目标图像进行划分,以获取多个图像块。
  7. 如权利要求6所述的页面方向识别方法,其特征在于,所述基于所述原点和预设的裁剪方式确定待划分的图像块的长度和宽度的步骤,包括:
    获取所述目标图像的初始长度和初始宽度,基于所述原点、所述初始长度和所述初始宽度确定待划分的图像块的长度和宽度,其中,所述长度小于或等于所述初始长度,所述宽度小于或等于所述初始宽度。
  8. 一种页面方向识别装置,其特征在于,所述页面方向识别装置包括:
    划分模块,用于确定待进行图像检测的目标图像,根据预设的裁剪方式对所述目标图像进行划分,以获取多个图像块;
    确定模块,用于基于预设的卷积神经网络模型对各所述图像块进行训练,基于所述训练的训练结果确定各所述图像块中是否具有文字和文字方向的目标图像块;
    分类汇总模块,用于若在各所述图像块中存在多个具有文字和文字方向的目标图像块,则对各所述目标图像块的文字方向进行分类汇总,基于所述分类汇总的分类汇总结果确定目标文字方向,并将所述目标文字方向作为所述目标图像的页面方向。
  9. 一种页面方向识别设备,其特征在于,所述页面方向识别设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的页面方向识别程序,所述页面方向识别程序被所述处理器执行时实现如下步骤:
    确定待进行图像检测的目标图像,根据预设的裁剪方式对所述目标图像进行划分,以获取多个图像块;
    基于预设的卷积神经网络模型对各所述图像块进行训练,基于所述训练的训练结果确定各所述图像块中是否具有文字和文字方向的目标图像块;
    若在各所述图像块中存在多个具有文字和文字方向的目标图像块,则对各所述目标图像块的文字方向进行分类汇总,基于所述分类汇总的分类汇总结果确定目标文字方向,并将所述目标文字方向作为所述目标图像的页面方向。
  10. 根据权利要求9所述的页面方向识别设备,其中,所述页面方向识别程序被所述处理器执行时实现的步骤还包括:
    基于所述训练的训练结果遍历各所述图像块,获取遍历的图像块的整体面积和所述遍历的图像块中疑似文字区域所占面积;
    计算所述疑似文字区域所占面积和整体面积的比例值,并判断所述比例值是否大于预设阈值;
    若所述比例值大于预设阈值,则确定遍历的图像块中具有文字,根据所述训练结果确定所述遍历的图像块中文字的文字方向,并将所述遍历的图像块作为目标图像块。
  11. 根据权利要求10所述的页面方向识别设备,其中,所述页面方向识别程序被所述处理器执行时实现的步骤还包括:
    根据所述训练结果确定所述遍历的图像块对应的标签结果,并将所述标签结果和预设的标签方向对照表进行匹配,根据所述匹配结果确定所述遍历的图像块中文字的文字方向。
  12. 根据权利要求9所述的页面方向识别设备,其中,所述页面方向识别程序被所述处理器执行时实现的步骤还包括:
    将预设映射对照表中的多个初始图像块输入至原始的卷积神经网络模型进行训练,以获取各所述初始图像块的文本信息,并将各所述文本信息和所述预设映射对照表中各所述初始图像块对应的标注信息进行比对;
    若比对失败,则确定各所述文本信息和各所述标注信息的误差,并根据所述误差对所述原始的卷积神经网络模型进行优化,以获取预设的卷积神经网络模型。
  13. 根据权利要求9所述的页面方向识别设备,其中,所述页面方向识别程序被所述处理器执行时实现的步骤还包括:
    对各所述目标图像块的文字方向进行分类汇总,以获取多个初始文字方向,并确定各所述初始文字方向对应的目标图像块数量,在各所述初始文字方向中将目标图像块数量最多的初始文字方向作为目标文字方向。
  14. 根据权利要求9-13任意一项所述的页面方向识别设备,其中,所述页面方向识别程序被所述处理器执行时实现的步骤还包括:
    确定所述目标图像中的原点,基于所述原点和预设的裁剪方式确定待划分的图像块的长度和宽度,根据所述长度和所述宽度对所述目标图像进行划分,以获取多个图像块。
  15. 根据权利要求14所述的页面方向识别设备,其中,所述页面方向识别程序被所述处理器执行时实现的步骤还包括:
    获取所述目标图像的初始长度和初始宽度,基于所述原点、所述初始长度和所述初始宽度确定待划分的图像块的长度和宽度,其中,所述长度小于或等于所述初始长度,所述宽度小于或等于所述初始宽度。
  16. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有页面方向识别程序,所述页面方向识别程序被处理器执行时实现如下步骤:
    确定待进行图像检测的目标图像,根据预设的裁剪方式对所述目标图像进行划分,以获取多个图像块;
    基于预设的卷积神经网络模型对各所述图像块进行训练,基于所述训练的训练结果确定各所述图像块中是否具有文字和文字方向的目标图像块;
    若在各所述图像块中存在多个具有文字和文字方向的目标图像块,则对各所述目标图像块的文字方向进行分类汇总,基于所述分类汇总的分类汇总结果确定目标文字方向,并将所述目标文字方向作为所述目标图像的页面方向。
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述页面方向识别程序被所述处理器执行时实现的步骤还包括:
    基于所述训练的训练结果遍历各所述图像块,获取遍历的图像块的整体面积和所述遍历的图像块中疑似文字区域所占面积;
    计算所述疑似文字区域所占面积和整体面积的比例值,并判断所述比例值是否大于预设阈值;
    若所述比例值大于预设阈值,则确定遍历的图像块中具有文字,根据所述训练结果确定所述遍历的图像块中文字的文字方向,并将所述遍历的图像块作为目标图像块。
  18. 根据权利要求17所述的页面方向识别设备,其中,所述页面方向识别程序被所述处理器执行时实现的步骤还包括:
    根据所述训练结果确定所述遍历的图像块对应的标签结果,并将所述标签结果和预设的标签方向对照表进行匹配,根据所述匹配结果确定所述遍历的图像块中文字的文字方向。
  19. 根据权利要求16所述的页面方向识别设备,其中,所述页面方向识别程序被所述处理器执行时实现的步骤还包括:
    将预设映射对照表中的多个初始图像块输入至原始的卷积神经网络模型进行训练,以获取各所述初始图像块的文本信息,并将各所述文本信息和所述预设映射对照表中各所述初始图像块对应的标注信息进行比对;
    若比对失败,则确定各所述文本信息和各所述标注信息的误差,并根据所述误差对所述原始的卷积神经网络模型进行优化,以获取预设的卷积神经网络模型。
  20. 根据权利要求16所述的页面方向识别设备,其中,所述页面方向识别程序被所述处理器执行时实现的步骤还包括:
    对各所述目标图像块的文字方向进行分类汇总,以获取多个初始文字方向,并确定各所述初始文字方向对应的目标图像块数量,在各所述初始文字方向中将目标图像块数量最多的初始文字方向作为目标文字方向。
PCT/CN2021/127179 2020-11-17 2021-10-28 页面方向识别方法、装置、设备及计算机可读存储介质 WO2022105569A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011282095.2 2020-11-17
CN202011282095.2A CN112101317B (zh) 2020-11-17 2020-11-17 页面方向识别方法、装置、设备及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2022105569A1 true WO2022105569A1 (zh) 2022-05-27

Family

ID=73785712

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/127179 WO2022105569A1 (zh) 2020-11-17 2021-10-28 页面方向识别方法、装置、设备及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN112101317B (zh)
WO (1) WO2022105569A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115346205A (zh) * 2022-10-17 2022-11-15 广州简悦信息科技有限公司 一种页面信息的识别方法、装置及电子设备

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112101317B (zh) * 2020-11-17 2021-02-19 深圳壹账通智能科技有限公司 页面方向识别方法、装置、设备及计算机可读存储介质
CN112766266B (zh) * 2021-01-29 2021-12-10 云从科技集团股份有限公司 基于分阶段概率统计的文本方向矫正方法、系统及装置
CN112926564A (zh) * 2021-02-25 2021-06-08 中国平安人寿保险股份有限公司 图片分析方法、系统、计算机设备和计算机可读存储介质
CN113780131B (zh) * 2021-08-31 2024-04-12 众安在线财产保险股份有限公司 文本图像朝向识别方法和文本内容识别方法、装置、设备
CN114155546B (zh) * 2022-02-07 2022-05-20 北京世纪好未来教育科技有限公司 一种图像矫正方法、装置、电子设备和存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109845237A (zh) * 2016-08-17 2019-06-04 惠普打印机韩国有限公司 图像形成设备、图像形成设备的扫描图像校正方法以及非暂时性计算机可读记录介质
CN111353491A (zh) * 2020-03-12 2020-06-30 中国建设银行股份有限公司 一种文字方向确定方法、装置、设备及存储介质
CN111382740A (zh) * 2020-03-13 2020-07-07 深圳前海环融联易信息科技服务有限公司 文本图片解析方法、装置、计算机设备及存储介质
CN111507214A (zh) * 2020-04-07 2020-08-07 中国人民财产保险股份有限公司 文档识别方法、装置及设备
CN111753850A (zh) * 2020-06-29 2020-10-09 珠海奔图电子有限公司 文档处理方法、装置、计算机设备及计算机可读存储介质
CN112101317A (zh) * 2020-11-17 2020-12-18 深圳壹账通智能科技有限公司 页面方向识别方法、装置、设备及计算机可读存储介质

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101777124A (zh) * 2010-01-29 2010-07-14 北京新岸线网络技术有限公司 一种提取视频文本信息的方法及装置
KR101214772B1 (ko) * 2010-02-26 2012-12-21 삼성전자주식회사 문자의 방향성을 기반으로 한 문자 인식 장치 및 방법
EP3534318A1 (en) * 2013-09-26 2019-09-04 Mark W. Publicover Providing targeted content based on a user´s moral values
CN106326854B (zh) * 2016-08-19 2019-09-06 掌阅科技股份有限公司 一种版式文档段落识别方法
CN110490198A (zh) * 2019-08-12 2019-11-22 上海眼控科技股份有限公司 文本方向校正方法、装置、计算机设备和存储介质
CN110942063B (zh) * 2019-11-21 2023-04-07 望海康信(北京)科技股份公司 证件文字信息获取方法、装置以及电子设备
CN111091124B (zh) * 2019-12-04 2022-06-03 吉林大学 一种书脊文字识别方法
CN111062374A (zh) * 2019-12-10 2020-04-24 爱信诺征信有限公司 身份证信息的识别方法、装置、系统、设备及可读介质
CN111144288A (zh) * 2019-12-25 2020-05-12 联想(北京)有限公司 一种图像处理方法、装置及电子设备
CN111639646B (zh) * 2020-05-18 2021-04-13 山东大学 一种基于深度学习的试卷手写英文字符识别方法及系统
CN111814429A (zh) * 2020-07-30 2020-10-23 深圳壹账通智能科技有限公司 文章排版方法、装置、终端设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109845237A (zh) * 2016-08-17 2019-06-04 惠普打印机韩国有限公司 图像形成设备、图像形成设备的扫描图像校正方法以及非暂时性计算机可读记录介质
CN111353491A (zh) * 2020-03-12 2020-06-30 中国建设银行股份有限公司 一种文字方向确定方法、装置、设备及存储介质
CN111382740A (zh) * 2020-03-13 2020-07-07 深圳前海环融联易信息科技服务有限公司 文本图片解析方法、装置、计算机设备及存储介质
CN111507214A (zh) * 2020-04-07 2020-08-07 中国人民财产保险股份有限公司 文档识别方法、装置及设备
CN111753850A (zh) * 2020-06-29 2020-10-09 珠海奔图电子有限公司 文档处理方法、装置、计算机设备及计算机可读存储介质
CN112101317A (zh) * 2020-11-17 2020-12-18 深圳壹账通智能科技有限公司 页面方向识别方法、装置、设备及计算机可读存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115346205A (zh) * 2022-10-17 2022-11-15 广州简悦信息科技有限公司 一种页面信息的识别方法、装置及电子设备

Also Published As

Publication number Publication date
CN112101317A (zh) 2020-12-18
CN112101317B (zh) 2021-02-19

Similar Documents

Publication Publication Date Title
WO2022105569A1 (zh) 页面方向识别方法、装置、设备及计算机可读存储介质
CN109961009B (zh) 基于深度学习的行人检测方法、系统、装置及存储介质
CN108009543B (zh) 一种车牌识别方法及装置
US10977523B2 (en) Methods and apparatuses for identifying object category, and electronic devices
CN109461167B (zh) 图像处理模型的训练方法、抠图方法、装置、介质及终端
CN110232311B (zh) 手部图像的分割方法、装置及计算机设备
WO2020199906A1 (zh) 人脸关键点检测方法、装置、设备及存储介质
US9697416B2 (en) Object detection using cascaded convolutional neural networks
CN110348294B (zh) Pdf文档中图表的定位方法、装置及计算机设备
US10410053B2 (en) Method, apparatus, system, and storage medium for detecting information card in image
CN110675940A (zh) 病理图像标注方法、装置、计算机设备及存储介质
US20180225542A1 (en) Image information recognition processing method and device, and computer storage medium
CN109697414B (zh) 一种文本定位方法及装置
WO2021147219A1 (zh) 基于图像的文本识别方法、装置、电子设备及存储介质
CN111814905A (zh) 目标检测方法、装置、计算机设备和存储介质
CN113673519B (zh) 基于文字检测模型的文字识别方法及其相关设备
CN110431563B (zh) 图像校正的方法和装置
WO2021212873A1 (zh) 证件四角残缺检测方法、装置、设备及存储介质
CN112767354A (zh) 基于图像分割的缺陷检测方法、装置、设备及存储介质
WO2022002262A1 (zh) 基于计算机视觉的字符序列识别方法、装置、设备和介质
WO2022206534A1 (zh) 文本内容识别方法、装置、计算机设备和存储介质
US11270152B2 (en) Method and apparatus for image detection, patterning control method
CN113516697B (zh) 图像配准的方法、装置、电子设备及计算机可读存储介质
WO2022016996A1 (zh) 图像处理方法、装置、电子设备及计算机可读存储介质
CN110717060B (zh) 图像mask的过滤方法、装置及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21893717

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 23.08.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21893717

Country of ref document: EP

Kind code of ref document: A1