WO2014086287A1 - Text image automatic dividing method and device, method for automatically dividing handwriting entries - Google Patents

Text image automatic dividing method and device, method for automatically dividing handwriting entries Download PDF

Info

Publication number
WO2014086287A1
WO2014086287A1 PCT/CN2013/088494 CN2013088494W WO2014086287A1 WO 2014086287 A1 WO2014086287 A1 WO 2014086287A1 CN 2013088494 W CN2013088494 W CN 2013088494W WO 2014086287 A1 WO2014086287 A1 WO 2014086287A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
image
block
paper page
user
Prior art date
Application number
PCT/CN2013/088494
Other languages
French (fr)
Chinese (zh)
Inventor
陈青山
罗希平
Original Assignee
上海合合信息科技发展有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海合合信息科技发展有限公司 filed Critical 上海合合信息科技发展有限公司
Publication of WO2014086287A1 publication Critical patent/WO2014086287A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition

Definitions

  • the invention relates to an image processing method, in particular to a text image automatic segmentation method.
  • the invention further relates to an image processing apparatus, and more particularly to a text image automatic segmentation apparatus.
  • the invention further relates to a method of automatically segmenting handwritten entries in an electronic notebook. Background technique
  • Smartphones are one of the commonly used tools for electronically documenting paper documents. Because the camera usually has a camera on the smartphone, the camera on the mobile phone can take a paper document, and the captured electronic document can be processed into a JPEG format or a PDF document. Applications with these features have also become more popular, such as the CamS Canne r application in the Apple App Store and the Google App Store. These applications can automatically monitor the four sides of the captured document from the captured image, use this as a reference to cut off the background outside the document area in the image, and perform correction and image enhancement on the document area to obtain a scanner similar to the one used. The effect of scanning a clean and clean electronic document is saved and managed in a user-specified format.
  • Paper has long been used to make various records, such as meeting minutes, memo records, and so on. In actual use, users often need to manually record the next item on the paper. For example, a user can write down the weekend's possible activity options by dividing it into three lines on the notebook page: 1. Shopping, 2, watching movies, 3 After going to the park; after taking the image of this paper and electronically, the user made a decision in these three options, choose 2, watch the movie, he needs to save this decision to the to-do list and need to be in the electronic device. It is very inconvenient to enter text once again.
  • the technical problem to be solved by the present invention is to provide a text task automatic segmentation method, and the like
  • the text task automatic segmentation device used in the automatic segmentation method of the task can conveniently help the user to edit and process the text tasks recorded on the paper.
  • the technical solution of the text image automatic segmentation method of the present invention comprises the following steps:
  • Each image block is rearranged and a tag is set for each block that can be edited by the user.
  • the invention further discloses a text image automatic segmentation method, which technical solution comprises the following steps: acquiring a text image;
  • the individual text blocks are rearranged and a tag is set for each block that can be edited by the user.
  • the invention also discloses an automatic text segmentation device for the text image automatic segmentation method, which is based on an electronic device including a computer system, and includes:
  • An image acquisition component that acquires a text image
  • An image layout identifying component that recognizes a layout of the text in the image, divides the image according to a paragraph of the text in the image, and uses the image portion of each paragraph as a text block or a text block;
  • the editing component is displayed, the individual text blocks are rearranged, and a mark can be set for each text block that can be edited by the user.
  • the invention further discloses a method for automatically segmenting handwritten entries in an electronic notebook, and the technical solution thereof is: the method for automatically segmenting handwritten entries in an electronic notebook comprises:
  • Determining a type of the paper page according to the paper page image obtaining a paper page blank segmentation template of the notebook of the type saved in advance, wherein the blank segmentation template is composed of a plurality of text blocks;
  • the text block in which the user's handwriting is located in the square area is determined, and the user's handwriting in any one of the text blocks is automatically segmented and extracted in units of text blocks.
  • the invention greatly facilitates the user's segmentation and classification of the text image, and the trouble of inputting into the electronic device one by one is omitted.
  • FIG. 3 are schematic diagrams showing an embodiment of a text image automatic segmentation method according to the present invention.
  • FIG. 4 and FIG. 5 are schematic diagrams showing another embodiment of a text image automatic segmentation method according to the present invention
  • FIG. 6 to FIG. 8 are schematic diagrams showing still another embodiment of a text image automatic segmentation method according to the present invention
  • FIG. FIG. 11 is a schematic diagram of a text image automatic segmentation apparatus according to another embodiment of the present invention.
  • FIG. 12 is a schematic flow chart of a method for automatically handwriting an entry in an electronic notebook according to the present invention.
  • the reference numerals in the figure are: 1. a touch screen and a button; 2. a camera. detailed description
  • each image text block will be Correspond to a to-do list.
  • Each image block is rearranged and displayed, and a mark that can be edited by the user is set for each block.
  • the tag can be a check box.
  • the check box can be checked for marking, as shown in FIG. 2, FIG. 3, FIG. 7, and FIG.
  • the electronic device does not accurately identify the layout of the text in the text image. Therefore, the user can manually adjust the division of the image block before the respective image blocks are rearranged and displayed. As shown in FIGS. 4 and 9, each image character block is framed, and the user can perform operations such as division and merging of the framed image character block.
  • the user can also annotate the image text block.
  • the present invention can also add individual image text blocks to the to-do items of the electronic device.
  • the image text block is directly listed for the user to edit, and the object to be edited and processed is still an image.
  • the invention also discloses another method for automatic segmentation of text images, comprising the following steps: acquiring a text image;
  • the individual text blocks are rearranged and a tag is set for each block that can be edited by the user.
  • the step of text recognition is added, and finally the text block is listed for user editing, which further facilitates the user's use.
  • the manner of acquiring the text image is to take a picture of the text or receive a file containing the text image.
  • the specific recognition process can be in the following two ways: One is to identify the layout of the text in the image, identify the text of the text in the image, divide the image according to the paragraph of the text in the image, and then divide each The recognized text in the image portion of the paragraph is used as a text block, that is, the character recognition is performed first, and then the image is divided. The other is to identify the layout of the text in the image, divide the image according to the paragraph of the text in the image, and then identify the text in the image portion of each paragraph, and identify the image portion of each paragraph. The latter text is used as a text block, that is, the image is divided first, and then the characters in the divided image portion are identified.
  • the user can manually adjust the division of the text block before rearranging the respective blocks. After dividing the image text block, the user can also annotate the image text block.
  • the user can also edit the text in the text block after the recognition.
  • the present invention can also add individual image text blocks to the to-do items of the electronic device.
  • FIG. 4 The layout of the image in Fig. 1 is identified.
  • the text recognition block is obtained through the step of text recognition, and the text text block is imported into the to-do list and listed to the user, as shown in FIG.
  • the layout of the image in Fig. 6 is identified, as shown in Fig. 9, after the text recognition step, a text block is obtained, and the text block is imported into the to-do list and listed to the user, as shown in FIG.
  • the invention further discloses a device for implementing the above-mentioned text image automatic segmentation method.
  • the electronic device based on the computer system such as a smart phone, a tablet computer, etc., includes:
  • An image acquisition component that acquires a text image
  • An image layout identifying component that recognizes a layout of the text in the image, divides the image according to a paragraph of the text in the image, and uses the image portion of each paragraph as a text block or a text block;
  • the editing component is displayed, the individual text blocks are rearranged, and a mark can be set for each text block that can be edited by the user.
  • the image acquisition section includes at least one of a photographing section that photographs text to acquire a text image, and a document receiving section that receives a file containing the text image to acquire a text image.
  • a camera 2 is provided on the smartphone shown in FIG.
  • the image layout identifying component further includes a text recognition component that recognizes text in the text image.
  • an adjustment component that manually adjusts the division of the text block before rearranging the respective text blocks.
  • the method further includes a time information adding component, adding corresponding time information to the text block, and issuing a prompt according to the time information in combination with the corresponding text text block.
  • the embodiment provides a method for automatically segmenting handwritten entries in an electronic notebook.
  • the method for automatically handwriting entries in an electronic notebook includes:
  • the paper page of the notebook that needs to be electronicized may be of any type, for example, the paper page is printed with a classification mark area, a page number area, a title area, a branch line, or/and a line, and the like. It can also be a combination of any of the above.
  • the four edge lines of the paper page image are determined by a line detection method in the image, and the page area defined by the four edge lines is corrected to a square area. Specifically, a line representing the outer edges of the four pages in the paper page image is acquired by a line detection method in the image, and a background area outside the range defined by the outer edge lines of the four pages in the image is cut out, and the outer edges of the four pages are excluded.
  • the paper page image is corrected based on the straight line, and the page area defined by the outer edge lines of the four pages is corrected to a rectangular area.
  • the type of the paper page is determined by the size and format of the paper page; the format of the paper page includes the number of text blocks included in the paper page, the size of the text block, and adjacent text. The spacing between the blocks. That is, the paper page may be composed of block regions of any shape, and each block region is a block of text. This block of text is exactly the same as the user's handwriting on the paper page.
  • the paper page image of the notebook photographed in the present invention belongs to a page type that has been previously saved by an application software such as the existing CamScanner. Therefore, the blank cut template of the paper page of the type saved in advance can be used to obtain the handwritten handwriting of the user.
  • the image area that is, the area where a block of text or a plurality of merged blocks of text
  • the accuracy is greatly improved.
  • the text block in which the user's handwriting is located in the square area is determined, and the user's handwriting in any one of the text blocks is automatically segmented and extracted in units of text blocks.
  • the text block can also be merged with the adjacent text block, that is, the text segment can be automatically segmented and extracted in any one of the characters.
  • User handwriting in the block In the corrected notebook paper page image, referring to the pre-saved blank cut template of the notebook paper page, determining the position of the user's handwriting in the blank page template in the notebook page, and handwriting the user The handwriting is divided into blocks of text representing different lines of text.
  • the user can manually merge adjacent regions representing a plurality of text blocks constituting the complete meaning into one by a simple operation. These cut-outs represent the contents of the text block that constitutes the complete meaning. They can be used to add to the list of to-do items in the electronic device. You can also use the existing handwriting recognition technology to identify the text, and save it. The user has trouble entering text manually on the electronic device.
  • the invention obtains and divides the handwritten text area of the user by using the text block assist in the pre-saved blank segmentation template when the notebook page is electronicized, and obtains an image block (also called a text) containing the complete handwritten entry.
  • Block which facilitates the electronic partitioning of paper pages and the use and management of electronic documents. That is, the present invention acquires and divides the text handwritten by the user on the paper page by using the pre-saved blank cut template assist when electronically copying the paper page of the notebook, because the blank cut template It consists of several blocks of text, so each block can be used as a unit of handwriting on the page to obtain a handwritten entry containing complete content, which realizes automatic segmentation and extraction of electronic document content.
  • the embodiment provides a method for automatically segmenting a handwritten entry in an electronic notebook, which is different from the method for automatically handwriting an entry in the electronic notebook according to the first embodiment: the paper page is known in advance.
  • the specific implementation manner of determining the type of the paper page according to the paper page image is: manually specifying the type of the paper page; that is, manually specifying the image before the image is taken, or before the image is processed after the image is taken.
  • the type of notebook paper page to which it belongs such as one of a series of notebook page types that are pre-stored in applications such as camScanner.
  • the embodiment provides a method for automatically segmenting handwritten entries in an electronic notebook, which is different from the method for handwriting entries in the automatic segmentation electronic notebooks described in Embodiments 1 and 2 in that: the paper is known in advance.
  • the type of the page the specific implementation manner of determining the type of the paper page according to the paper page image is:
  • the type mark may be a text A word, symbol, graphic, or a combination of any two or three items.
  • a type mark on the paper page image is detected, and the detected type mark is compared with a previously known type mark to find out the type to which the paper page belongs.
  • the four outer edges of the paper page of the notebook are detected in the image, and the approximate position of the mark is determined in the image of the paper page with reference to the four outer edges, thereby realizing the mark
  • the detection in the image then compares the detected mark with the pre-stored mark of the paper page representing a plurality of different types of notebooks to find out the type of the paper page of the photographed notebook.
  • the detected mark is compared with the pre-stored mark representing a plurality of different types of notebook paper pages to find out the type of the paper page of the photographed notebook, which involves handwriting recognition, text recognition, Mature techniques in the art such as image matching are not described herein.
  • the embodiment provides a method for automatically segmenting a handwritten entry in an electronic notebook, which is different from the method for automatically handwriting an entry in the electronic notebook according to the first embodiment: the type of the paper page is not known in advance.
  • the specific implementation manner of determining the type of the paper page according to the paper page image is:
  • the paper page of the notebook being photographed does not belong to a type of paper page printed with bold or/and lengthened branch lines, or/and a line of division, or/and a title area, which is known in advance by an application such as CamScanner. Then, in the subsequent steps, the type of the unknown paper page is first added to the type of the newly created paper page, and then the subsequent processing is performed.
  • the invention acquires and divides the text handwritten by the user on the paper page by using the pre-saved blank segmentation template assist when electronically printing the paper page of the notebook, because the blank segmentation template is composed of several characters.
  • the block is composed, so each block can be used as a unit of the handwriting on the page to obtain a handwritten entry containing the complete content, realizing the automatic segmentation and extraction of the electronic document content.
  • the invention greatly facilitates the user to segment and classify the text image, and the trouble of inputting into the electronic device one by one is omitted.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Character Input (AREA)
  • Processing Or Creating Images (AREA)

Abstract

Disclosed is a text image automatic dividing method, which comprises the following steps: obtaining a text image; identifying layout of a text in the image, dividing the image according to paragraphs of the text in the image, and using an image part of each paragraph as an image character block; and re-arranging and displaying the image character blocks, and setting, for each character block, a mark that can be edited by a user. Further disclosed is a device used by the text image automatic dividing method. Further disclosed is a method for automatically dividing handwriting entries in an electronic notebook. In the present invention, by using the technical solutions, the user can divide and classify text images conveniently, thereby eliminating the trouble of inputting to an electronic device entry by entry.

Description

文本图像自动切分方法及装置, 自动切分手写条目的方法 技术领域  Text image automatic segmentation method and device, method for automatically segmenting handwritten items
本发明涉及一种图像处理方法,尤其是一种文本图像自动切分方法。本发明 还涉及一种图像处理装置, 尤其是一种文本图像自动切分装置。 本发明还涉及一 种自动切分电子化笔记本中手写条目的方法。 背景技术  The invention relates to an image processing method, in particular to a text image automatic segmentation method. The invention further relates to an image processing apparatus, and more particularly to a text image automatic segmentation apparatus. The invention further relates to a method of automatically segmenting handwritten entries in an electronic notebook. Background technique
在日常生活中, 人们经常需要拍摄纸质文档, 保存成 JPEG格式的照片, 或 者生成 PDF格式的文档, 从而实现纸质文档的电子化, 方便管理。 智能手机就是 常用的将纸质文档电子化的工具之一。 因为智能手机上一般都带有摄像头, 利用 手机上的摄像头可以拍摄纸质文档, 并将拍摄得到的电子文档进行一定的图像处 理后再转换成 JPEG格式的照片, 或者生成 PDF格式的文档。 具备上述功能的应 用软件也已经比较普及了, 如苹果应用商店和 google 应用商店中的应用 CamSCanner。 这些应用软件可以从拍摄的图像中自动监测出所拍摄文档的四条 边, 以此为基准切除图像中文档区域外面的背景, 并对文档区域进行校正和图像 增强等处理,获得一个类似于用扫描仪扫描得到的整洁干净的电子化文档的效果, 以用户指定的格式进行保存和管理。 In daily life, people often need to take paper documents, save them in JPEG format, or generate PDF documents, so that paper documents can be electronicized and managed. Smartphones are one of the commonly used tools for electronically documenting paper documents. Because the camera usually has a camera on the smartphone, the camera on the mobile phone can take a paper document, and the captured electronic document can be processed into a JPEG format or a PDF document. Applications with these features have also become more popular, such as the CamS Canne r application in the Apple App Store and the Google App Store. These applications can automatically monitor the four sides of the captured document from the captured image, use this as a reference to cut off the background outside the document area in the image, and perform correction and image enhancement on the document area to obtain a scanner similar to the one used. The effect of scanning a clean and clean electronic document is saved and managed in a user-specified format.
长期以来人们经常用纸张来做各种记录, 如会议记录, 备忘事项记录等等。 在实际使用中, 用户经常需要在纸张上用手写方式记录下一个个的条目, 比如说 一个用户在笔记本页面上分成 3行写下周末可能的活动选项: 1、 逛街, 2、 看电 影, 3、 去公园; 拍摄了这张纸张的图像进行电子化以后, 用户在这 3个选项中做 出了决定, 选择 2、 看电影, 他需要把这个决定保存到待办事项中去就需要在电 子设备中再输入一次文字, 这就很不方便。 理想的做法是用户只要在电子设备上 显示出来的这张纸张的电子化文档中点击一下" 2、看电影", 笔迹所在的区域就自 动把包含" 2、 看电影"笔迹的图像区域切分出来, 加入到待办事项里面去。 发明内容  Paper has long been used to make various records, such as meeting minutes, memo records, and so on. In actual use, users often need to manually record the next item on the paper. For example, a user can write down the weekend's possible activity options by dividing it into three lines on the notebook page: 1. Shopping, 2, watching movies, 3 After going to the park; after taking the image of this paper and electronically, the user made a decision in these three options, choose 2, watch the movie, he needs to save this decision to the to-do list and need to be in the electronic device. It is very inconvenient to enter text once again. Ideally, the user simply clicks "2, watch movie" in the electronic document of the paper displayed on the electronic device, and the area where the handwriting is located automatically divides the image area containing the "2, watching movie" handwriting. Come out and join the to-do list. Summary of the invention
本发明所要解决的技术问题是提供一种文本任务自动切分方法,以及这种文 本任务自动切分方法所采用的文本任务自动切分装置, 能够方便的帮助用户对纸 张上记录的文本任务进行编辑处理。 The technical problem to be solved by the present invention is to provide a text task automatic segmentation method, and the like The text task automatic segmentation device used in the automatic segmentation method of the task can conveniently help the user to edit and process the text tasks recorded on the paper.
为解决上述技术问题,本发明文本图像自动切分方法的技术方案是,包括以 下步骤:  In order to solve the above technical problem, the technical solution of the text image automatic segmentation method of the present invention comprises the following steps:
获取文本图像;  Get a text image;
对所述图像中文本的布局进行识别,根据图像中文本的段落对图像进行划分, 将每个段落的图像部分作为一个图像文字块;  Identifying the layout of the text in the image, dividing the image according to the paragraph of the text in the image, and using the image portion of each paragraph as an image text block;
将各个图像文字块重新排列显示,并为每个文字块设置一个可以由用户进行 编辑操作的标记。  Each image block is rearranged and a tag is set for each block that can be edited by the user.
本发明又公开了一种文本图像自动切分方法,其技术方案是,包括以下步骤: 获取文本图像;  The invention further discloses a text image automatic segmentation method, which technical solution comprises the following steps: acquiring a text image;
对所述图像中文本的布局和文字进行识别, 根据图像中文本的段落对图像进 行划分, 将每个段落的图像部分中被识别后的文字作为一个文本文字块;  Identifying the layout and text of the text in the image, dividing the image according to the paragraph of the text in the image, and using the recognized text in the image portion of each paragraph as a text block;
将各个文本文字块重新排列显示, 并为每个文字块设置一个可以由用户进行 编辑操作的标记。  The individual text blocks are rearranged and a tag is set for each block that can be edited by the user.
本发明还公开了一种上述文本图像自动切分方法所采用的文本图像自动切分 装置, 其技术方案是, 基于包含计算机系统的电子设备, 包括:  The invention also discloses an automatic text segmentation device for the text image automatic segmentation method, which is based on an electronic device including a computer system, and includes:
图像获取部件, 获取文本图像;  An image acquisition component that acquires a text image;
图像布局识别部件, 对所述图像中文本的布局进行识别, 根据图像中文本的 段落对图像进行划分, 将每个段落的图像部分作为一个文字块或文字块;  An image layout identifying component that recognizes a layout of the text in the image, divides the image according to a paragraph of the text in the image, and uses the image portion of each paragraph as a text block or a text block;
显示编辑部件, 将各个文字块重新排列显示, 并为每个文字块设置一个可以 由用户进行编辑操作的标记。  The editing component is displayed, the individual text blocks are rearranged, and a mark can be set for each text block that can be edited by the user.
本发明又公开了一种自动切分电子化笔记本中手写条目的方法, 其技术方案 是, 所述自动切分电子化笔记本中手写条目的方法包括:  The invention further discloses a method for automatically segmenting handwritten entries in an electronic notebook, and the technical solution thereof is: the method for automatically segmenting handwritten entries in an electronic notebook comprises:
拍摄需要电子化的笔记本的纸质页面图像;  Shooting a paper page image of an electronic notebook that requires electronic access;
通过图像中的直线检测方法确定所述纸质页面图像的四条边缘线, 并将四条 边缘线所限定的页面区域校正为方形区域;  Determining four edge lines of the paper page image by a line detection method in the image, and correcting the page area defined by the four edge lines to a square area;
根据所述纸质页面图像确定所述纸质页面的类型, 获得预先保存的所述类型 笔记本的纸质页面空白切分模板, 所述空白切分模板由若干文字块组成; 确定所述正方形区域中用户手写笔迹所在的文字块, 以文字块为单位自动切 分提取处于任意一个文字块中的用户手写笔迹。 Determining a type of the paper page according to the paper page image, obtaining a paper page blank segmentation template of the notebook of the type saved in advance, wherein the blank segmentation template is composed of a plurality of text blocks; The text block in which the user's handwriting is located in the square area is determined, and the user's handwriting in any one of the text blocks is automatically segmented and extracted in units of text blocks.
本发明大大的方便了用户对于文本图像的切分和归类,省去了逐条输入到电 子设备中的麻烦。 附图说明  The invention greatly facilitates the user's segmentation and classification of the text image, and the trouble of inputting into the electronic device one by one is omitted. DRAWINGS
下面结合附图和实施例对本发明作进一步详细的说明:  The present invention will be further described in detail below with reference to the accompanying drawings and embodiments:
图 1-图 3为本发明文本图像自动切分方法一个实施例的示意图;  1 to FIG. 3 are schematic diagrams showing an embodiment of a text image automatic segmentation method according to the present invention;
图 4、 图 5为本发明文本图像自动切分方法另一个实施例的示意图; 图 6-图 8为本发明文本图像自动切分方法再一个实施例的示意图; 图 9、 图 10为本发明文本图像自动切分方法又一个实施例的示意图; 图 11为本发明文本图像自动切分装置的示意图;  4 and FIG. 5 are schematic diagrams showing another embodiment of a text image automatic segmentation method according to the present invention; FIG. 6 to FIG. 8 are schematic diagrams showing still another embodiment of a text image automatic segmentation method according to the present invention; FIG. FIG. 11 is a schematic diagram of a text image automatic segmentation apparatus according to another embodiment of the present invention; FIG.
图 12为本发明所述的自动切分电子化笔记本中手写条目的方法的流程示意 图中附图标记为, 1.触摸屏及按键; 2.摄像头。 具体实施方式  12 is a schematic flow chart of a method for automatically handwriting an entry in an electronic notebook according to the present invention. The reference numerals in the figure are: 1. a touch screen and a button; 2. a camera. detailed description
在工作生活中, 人们经常会将需要办的事情用纸张记录下来, 如图 1和图 6 所示, 可能会随意的写在一张纸上, 或者收到别人给予的文件, 上面记载需要办 的事情。 根据人们的书写习惯, 需要办的事情一般都会分段记录, 每段记录一件 待办事项, 并且段与段之间会有明显的间隔。 根据本发明所提供的文本图像自动 切分的方法, 基于智能手机、 平板电脑等电子设备实现, 用户可以直接对这些纸 张进行拍摄, 以获得文本图像, 或者由其它设备或者其他人拍摄扫描, 然后将文 本图像文件发送过来。在获取文本图像之后,对所述图像中文本的布局进行识别, 根据图像中文本的段落对图像进行划分, 将每个段落的图像部分作为一个图像文 字块, 这时每个图像文字块就会对应一个待办事项。 将各个图像文字块重新排列 显示, 并为每个文字块设置一个可以由用户进行编辑操作的标记。 该标记可以是 一个勾选框, 当用户每完成一件待办事项时, 就可以勾选该勾选框以作标记, 如 图 2、 图 3和图 7、 图 8所示。 有时, 由于手写格式不规范, 或者其它情况, 导致电子设备对文本图像中文 本的布局识别不够准确。 因此, 在将各个图像文字块重新排列显示之前, 用户可 以手动对所述图像文字块的划分进行调整。 如图 4和图 9中, 各图像文字块被框 出, 用户可以对所框的图像文字块进行分割、 合并等操作。 In work and life, people often record what they need to do with paper. As shown in Figures 1 and 6, they may be written on a piece of paper at random, or they may receive documents from others. Things. According to people's writing habits, things that need to be done are generally recorded in segments, one to-do list is recorded in each segment, and there is a clear gap between segments. The method for automatically segmenting text images according to the present invention is implemented based on an electronic device such as a smart phone or a tablet computer, and the user can directly capture the paper to obtain a text image, or scan by other devices or others, and then Send a text image file. After acquiring the text image, identifying the layout of the text in the image, dividing the image according to the paragraph of the text in the image, and using the image portion of each paragraph as an image text block, then each image text block will be Correspond to a to-do list. Each image block is rearranged and displayed, and a mark that can be edited by the user is set for each block. The tag can be a check box. When the user completes a to-do list, the check box can be checked for marking, as shown in FIG. 2, FIG. 3, FIG. 7, and FIG. Sometimes, because the handwritten format is not standardized, or other circumstances, the electronic device does not accurately identify the layout of the text in the text image. Therefore, the user can manually adjust the division of the image block before the respective image blocks are rearranged and displayed. As shown in FIGS. 4 and 9, each image character block is framed, and the user can perform operations such as division and merging of the framed image character block.
在划分图像文字块之后, 用户还可以对图像文字块添加批注。  After dividing the image text block, the user can also annotate the image text block.
目前, 诸如智能手机、 平板电脑等电子设备中, 都具有待办事项添加功能。 本发明也可以将各个图像文字块分别添加至电子设备的待办事项中。  Currently, electronic devices such as smartphones and tablets have a to-do list addition function. The present invention can also add individual image text blocks to the to-do items of the electronic device.
另外, 还可以对图像文字块添加对应的时间信息, 并且根据时间信息结合对 应的图像文字块发出提示, 这也可以与电子设备中目前已有的待办事项功能进行 结合。 如图 8所示。  In addition, it is also possible to add corresponding time information to the image text block, and issue a prompt according to the time information in combination with the corresponding image text block, which can also be combined with the to-do item function currently existing in the electronic device. As shown in Figure 8.
上述实施例中, 直接将图像文字块罗列供用户编辑处理, 编辑和处理的对象 还是图像。 本发明还公开了另一种文本图像自动切分的方法, 包括以下步骤: 获取文本图像;  In the above embodiment, the image text block is directly listed for the user to edit, and the object to be edited and processed is still an image. The invention also discloses another method for automatic segmentation of text images, comprising the following steps: acquiring a text image;
对所述图像中文本的布局和文字进行识别, 根据图像中文本的段落对图像进 行划分, 将每个段落的图像部分中被识别后的文字作为一个文本文字块;  Identifying the layout and text of the text in the image, dividing the image according to the paragraph of the text in the image, and using the recognized text in the image portion of each paragraph as a text block;
将各个文本文字块重新排列显示, 并为每个文字块设置一个可以由用户进行 编辑操作的标记。  The individual text blocks are rearranged and a tag is set for each block that can be edited by the user.
本实施例中, 加入了文字识别的步骤, 最后将文本文字块罗列供用户编辑处 理, 更进一步的方便了用户的使用。  In this embodiment, the step of text recognition is added, and finally the text block is listed for user editing, which further facilitates the user's use.
所述获取文本图像的方式为对文本进行拍摄,或者接收包含文本图像的文件。 具体的识别过程可以有以下两种方式: 一种是对所述图像中文本的布局进行 识别, 对所述图像中文本的文字进行识别, 根据图像中文本的段落对图像进行划 分, 然后将每个段落的图像部分中被识别后的文字作为一个文本文字块, 即先进 行文字识别, 之后进行图像划分。 另一种是对所述图像中文本的布局进行识别, 根据图像中文本的段落对图像进行划分, 然后对每个段落的图像部分中的文字进 行识别, 将每个段落的图像部分中被识别后的文字作为一个文本文字块, 即先进 行图像划分, 然后对划分之后的图像部分中的文字进行识别。  The manner of acquiring the text image is to take a picture of the text or receive a file containing the text image. The specific recognition process can be in the following two ways: One is to identify the layout of the text in the image, identify the text of the text in the image, divide the image according to the paragraph of the text in the image, and then divide each The recognized text in the image portion of the paragraph is used as a text block, that is, the character recognition is performed first, and then the image is divided. The other is to identify the layout of the text in the image, divide the image according to the paragraph of the text in the image, and then identify the text in the image portion of each paragraph, and identify the image portion of each paragraph. The latter text is used as a text block, that is, the image is divided first, and then the characters in the divided image portion are identified.
为消除电子设备在布局识别时产生的错误, 在将各个文字块重新排列显示之 前, 用户可以手动对所述文本文字块的划分进行调整。 在划分图像文字块之后, 用户还可以对图像文字块添加批注。 In order to eliminate errors generated by the electronic device during layout recognition, the user can manually adjust the division of the text block before rearranging the respective blocks. After dividing the image text block, the user can also annotate the image text block.
由于本实施例中, 文本图像中的文字已经被识别, 用户还可以对识别之后的 文本文字块中的文字进行编辑。  Since the text in the text image has been recognized in this embodiment, the user can also edit the text in the text block after the recognition.
目前, 诸如智能手机、 平板电脑等电子设备中, 都具有待办事项添加功能。 本发明也可以将各个图像文字块分别添加至电子设备的待办事项中。  Currently, electronic devices such as smartphones and tablets have a to-do list addition function. The present invention can also add individual image text blocks to the to-do items of the electronic device.
如图 4、 图 5的实施例以及图 9、 图 10的实施例所示。 对图 1中的图像的布 局进行识别, 如图 4所示, 再经过文字识别的步骤得到文本文字块, 将文本文字 块导入待办事项中, 罗列给用户, 如图 5所示。对图 6中的图像的布局进行识别, 如图 9所示, 再经过文字识别的步骤得到文本文字块, 将文本文字块导入待办事 项中, 罗列给用户, 如图 10所示。  The embodiment of Figures 4 and 5 and the embodiment of Figures 9 and 10 are shown. The layout of the image in Fig. 1 is identified. As shown in Fig. 4, the text recognition block is obtained through the step of text recognition, and the text text block is imported into the to-do list and listed to the user, as shown in FIG. The layout of the image in Fig. 6 is identified, as shown in Fig. 9, after the text recognition step, a text block is obtained, and the text block is imported into the to-do list and listed to the user, as shown in FIG.
另外, 还可以对图像文字块添加对应的时间信息, 并且根据时间信息结合对 应的图像文字块发出提示, 这也可以与电子设备中目前已有的待办事项功能进行 结合, 如图 10所示。  In addition, it is also possible to add corresponding time information to the image text block, and issue a prompt according to the time information in combination with the corresponding image text block, which can also be combined with the to-do item function currently existing in the electronic device, as shown in FIG. .
本发明又公开了一种实现上述文本图像自动切分方法所采用的装置, 如图 11 所示, 基于包含计算机系统的电子设备, 如智能手机, 平板电脑等, 包括:  The invention further discloses a device for implementing the above-mentioned text image automatic segmentation method. As shown in FIG. 11, the electronic device based on the computer system, such as a smart phone, a tablet computer, etc., includes:
图像获取部件, 获取文本图像;  An image acquisition component that acquires a text image;
图像布局识别部件, 对所述图像中文本的布局进行识别, 根据图像中文本的 段落对图像进行划分, 将每个段落的图像部分作为一个文字块或文字块;  An image layout identifying component that recognizes a layout of the text in the image, divides the image according to a paragraph of the text in the image, and uses the image portion of each paragraph as a text block or a text block;
显示编辑部件, 将各个文字块重新排列显示, 并为每个文字块设置一个可以 由用户进行编辑操作的标记。  The editing component is displayed, the individual text blocks are rearranged, and a mark can be set for each text block that can be edited by the user.
所述图像获取部件至少包括拍摄部件以及文件接收部件中的至少一个, 所述 拍摄部件对文本进行拍摄以获取文本图像, 所述文件接收部件接收包含文本图像 的文件以获取文本图像。 如图 11所示的智能手机上设置有摄像头 2。  The image acquisition section includes at least one of a photographing section that photographs text to acquire a text image, and a document receiving section that receives a file containing the text image to acquire a text image. A camera 2 is provided on the smartphone shown in FIG.
所述图像布局识别部件还包括文字识别部件, 对所述文本图像中的文字进行 识别。  The image layout identifying component further includes a text recognition component that recognizes text in the text image.
还包括调整部件, 在将各个文字块重新排列显示之前, 用户手动对所述文字 块的划分进行调整。  Also included is an adjustment component that manually adjusts the division of the text block before rearranging the respective text blocks.
还包括批注添加部件,在划分文本文字块之后,用户对文本文字块添加批注。 还包括文本编辑部件,在划分文本文字块之后,用户对文本文字块进行编辑。 还包括待办事项添加部件,将各个文字块分别添加至电子设备的待办事项中。 还包括时间信息添加部件, 对文字块添加对应的时间信息, 并且根据时间信 息结合对应的文本文字块发出提示。 It also includes annotations to add parts, and after dividing the text block, the user annotates the text block. It also includes a text editing component that edits the text block after the text block is divided. It also includes a to-do list adding component that adds each text block to the to-do list of the electronic device. The method further includes a time information adding component, adding corresponding time information to the text block, and issuing a prompt according to the time information in combination with the corresponding text text block.
下面结合其它实施例和附图对本发明进行详细说明。  The invention will be described in detail below in conjunction with other embodiments and the accompanying drawings.
实施例一  Embodiment 1
本实施例提供一种自动切分电子化笔记本中手写条目的方法, 如图 12所示, 所述自动切分电子化笔记本中手写条目的方法包括:  The embodiment provides a method for automatically segmenting handwritten entries in an electronic notebook. As shown in FIG. 12, the method for automatically handwriting entries in an electronic notebook includes:
拍摄需要电子化的笔记本的纸质页面图像。 本实施例中, 所述需要电子化的 笔记本的纸质页面可以为任意类型, 如该纸质页面上印刷有分类标识区域、 页码 区域、标题区域、分行线、或 /和分列线等等, 也可以是上述各项的任意方式组合。  Take a picture of a paper page that requires an electronic notebook. In this embodiment, the paper page of the notebook that needs to be electronicized may be of any type, for example, the paper page is printed with a classification mark area, a page number area, a title area, a branch line, or/and a line, and the like. It can also be a combination of any of the above.
通过图像中的直线检测方法确定所述纸质页面图像的四条边缘线, 并将四条 边缘线所限定的页面区域校正为方形区域。 具体地, 通过图像中的直线检测方法 获取代表纸质页面图像中的四条页面外边缘的直线, 切除掉图像中这四条页面外 边缘直线限定的范围以外的背景区域, 并以这四条页面外边缘直线为基准对拍摄 的纸质页面图像进行校正, 把这四条页面外边缘直线所限定的页面区域校正成长 方形区域。  The four edge lines of the paper page image are determined by a line detection method in the image, and the page area defined by the four edge lines is corrected to a square area. Specifically, a line representing the outer edges of the four pages in the paper page image is acquired by a line detection method in the image, and a background area outside the range defined by the outer edge lines of the four pages in the image is cut out, and the outer edges of the four pages are excluded. The paper page image is corrected based on the straight line, and the page area defined by the outer edge lines of the four pages is corrected to a rectangular area.
根据所述纸质页面图像确定所述纸质页面的类型, 获得预先保存的所述类型 笔记本的纸质页面空白切分模板, 所述空白切分模板由若干文字块组成。 本实施 例中, 所述纸质页面的类型由该纸质页面的大小和格式决定; 所述纸质页面的格 式包括纸质页面包括的文字块的数目、 文字块的大小、 以及相邻文字块之间的间 隔。 也就是说, 所述纸质页面可以由任意形状的块区域组成, 每个块区域即为一 个文字块。 该文字块正好可以完整地分割纸质页面上的用户手写笔迹。  Determining the type of the paper page according to the paper page image, obtaining a paper page blank segmentation template of the type notebook in advance, the blank segment template being composed of a plurality of text blocks. In this embodiment, the type of the paper page is determined by the size and format of the paper page; the format of the paper page includes the number of text blocks included in the paper page, the size of the text block, and adjacent text. The spacing between the blocks. That is, the paper page may be composed of block regions of any shape, and each block region is a block of text. This block of text is exactly the same as the user's handwriting on the paper page.
本发明中所拍摄的笔记本的纸质页面图像属于现有 CamScanner等应用软件 事先已经保存的页面类型, 因此能够参照预先保存的该类型的纸质页面的空白切 分模板来获取用户手写笔迹所在的图像区域 (即一个文字块或合并后的多个文字 块所在的区域), 显然准确性会大大提高。  The paper page image of the notebook photographed in the present invention belongs to a page type that has been previously saved by an application software such as the existing CamScanner. Therefore, the blank cut template of the paper page of the type saved in advance can be used to obtain the handwritten handwriting of the user. The image area (that is, the area where a block of text or a plurality of merged blocks of text) is located, obviously the accuracy is greatly improved.
确定所述正方形区域中用户手写笔迹所在的文字块, 以文字块为单位自动切 分提取处于任意一个文字块中的用户手写笔迹。 其中, 所述文字块也能够与相邻 的文字块合并, 即可以以合并后的文字块为单位自动切分提取处于任意一个文字 块中的用户手写笔迹。 在校正后的笔记本纸质页面图像中, 参照所述预先保存的 该笔记本纸质页面的空白切分模板, 确定笔记本页面中的用户手写笔迹在空白切 分模板中的位置, 并把用户的手写笔迹切分成代表了不同的文字行的文字块。 通 过本发明所述的方法, 用户可以通过简单的操作人工把临近的代表了构成完整含 义的多个文字块的区域合并成一个。 这些切分出来的代表了构成完整含义的文字 块中的内容可以用来加入到电子设备中的代办事项的列表中, 也可以利用现有的 手写识别技术来识别出其中的文字来, 省去用户在电子设备上手工输入文字的麻 烦。 The text block in which the user's handwriting is located in the square area is determined, and the user's handwriting in any one of the text blocks is automatically segmented and extracted in units of text blocks. The text block can also be merged with the adjacent text block, that is, the text segment can be automatically segmented and extracted in any one of the characters. User handwriting in the block. In the corrected notebook paper page image, referring to the pre-saved blank cut template of the notebook paper page, determining the position of the user's handwriting in the blank page template in the notebook page, and handwriting the user The handwriting is divided into blocks of text representing different lines of text. By the method of the present invention, the user can manually merge adjacent regions representing a plurality of text blocks constituting the complete meaning into one by a simple operation. These cut-outs represent the contents of the text block that constitutes the complete meaning. They can be used to add to the list of to-do items in the electronic device. You can also use the existing handwriting recognition technology to identify the text, and save it. The user has trouble entering text manually on the electronic device.
本发明通过在对笔记本页面进行电子化的时候, 用预先保存的空白切分模板 中文字块辅助来获取并分割用户手写的文字区域, 得到包含了内容完整的手写条 目的图像块(也称文字块), 从而方便纸质页面的分区电子化, 及电子化后的文档 的使用和管理。也就是说,本发明通过在对笔记本的纸质页面进行电子化的时候, 用预先保存的空白切分模板辅助来获取并分割纸质页面上的用户手写的文字, 因 为所述空白切分模板由若干个文字块组成, 所以每个文字块均可以作为切分页面 上字迹的单位, 从而获得包含了内容完整的手写条目, 实现了电子化文档内容的 自动切分和提取。  The invention obtains and divides the handwritten text area of the user by using the text block assist in the pre-saved blank segmentation template when the notebook page is electronicized, and obtains an image block (also called a text) containing the complete handwritten entry. Block), which facilitates the electronic partitioning of paper pages and the use and management of electronic documents. That is, the present invention acquires and divides the text handwritten by the user on the paper page by using the pre-saved blank cut template assist when electronically copying the paper page of the notebook, because the blank cut template It consists of several blocks of text, so each block can be used as a unit of handwriting on the page to obtain a handwritten entry containing complete content, which realizes automatic segmentation and extraction of electronic document content.
实施例二  Embodiment 2
本实施例提供一种自动切分电子化笔记本中手写条目的方法, 其与实施例一 所述的自动切分电子化笔记本中手写条目的方法的区别在于: 预先已知所述纸质 页面的类型,根据所述纸质页面图像确定所述纸质页面的类型的具体实现方式为: 人工指定所述纸质页面的类型; 即用户在拍摄图像之前, 或者拍摄图像之后处理 图像之前, 人工指定笔记本的纸质页面所属的类型, 比如从预先保存在 camScanner等应用软件中的一系列笔记本页面类型中选择一个。  The embodiment provides a method for automatically segmenting a handwritten entry in an electronic notebook, which is different from the method for automatically handwriting an entry in the electronic notebook according to the first embodiment: the paper page is known in advance. The specific implementation manner of determining the type of the paper page according to the paper page image is: manually specifying the type of the paper page; that is, manually specifying the image before the image is taken, or before the image is processed after the image is taken. The type of notebook paper page to which it belongs, such as one of a series of notebook page types that are pre-stored in applications such as camScanner.
实施例三  Embodiment 3
本实施例提供一种自动切分电子化笔记本中手写条目的方法, 其与实施例一 和二所述的自动切分电子化笔记本中手写条目的方法的区别在于: 预先已知所述 纸质页面的类型, 根据所述纸质页面图像确定所述纸质页面的类型的具体实现方 式为:  The embodiment provides a method for automatically segmenting handwritten entries in an electronic notebook, which is different from the method for handwriting entries in the automatic segmentation electronic notebooks described in Embodiments 1 and 2 in that: the paper is known in advance. The type of the page, the specific implementation manner of determining the type of the paper page according to the paper page image is:
在所述纸质页面上的固定位置处印刷有一类型标记; 所述类型标记可以为文 字、 符号、 图形或者任意两项或三项的结合。 Printing a type mark on a fixed position on the paper page; the type mark may be a text A word, symbol, graphic, or a combination of any two or three items.
检测所述纸质页面图像上的类型标记, 将该检测到的类型标记与预先已知的 类型标记进行一一比较, 找出所述纸质页面所属的类型。 在所述纸质页面上的固 定位置处印刷有一类型标记; 即预先在笔记本的每一张纸质页面的指定位置印刷 上一个预先设计好的标记(即类型标记),在拍摄获取了笔记本的纸质页面的图像 以后, 先在图像中检测出笔记本的纸质页面的四条外边缘, 以该四条外边缘为参 照在纸质页面的图像中确定所述标记的大致位置, 从而实现所述标记在图像中的 检测, 然后把检测到的标记跟预先保存的代表多个不同类型的笔记本的纸质页面 的标记进行一一比较, 找出所拍摄的笔记本的纸质页面所属的类型。 将检测到的 标记跟预先保存的代表多个不同类型的笔记本纸质页面的标记进行一一比较, 找 出所拍摄的笔记本的纸质页面所属的类型, 这一步骤涉及手写识别, 文字识别, 图像匹配等本领域中的成熟技术, 在此不作赘述。  A type mark on the paper page image is detected, and the detected type mark is compared with a previously known type mark to find out the type to which the paper page belongs. Printing a type mark on a fixed position on the paper page; that is, printing a pre-designed mark (ie, type mark) on a specified position of each paper page of the notebook in advance, and acquiring the notebook in the photographing After the image of the paper page, the four outer edges of the paper page of the notebook are detected in the image, and the approximate position of the mark is determined in the image of the paper page with reference to the four outer edges, thereby realizing the mark The detection in the image then compares the detected mark with the pre-stored mark of the paper page representing a plurality of different types of notebooks to find out the type of the paper page of the photographed notebook. The detected mark is compared with the pre-stored mark representing a plurality of different types of notebook paper pages to find out the type of the paper page of the photographed notebook, which involves handwriting recognition, text recognition, Mature techniques in the art such as image matching are not described herein.
实施例四  Embodiment 4
本实施例提供一种自动切分电子化笔记本中手写条目的方法, 其与实施例一 所述的自动切分电子化笔记本中手写条目的方法的区别在于: 预先不知所述纸质 页面的类型, 在此种情况下, 根据所述纸质页面图像确定所述纸质页面的类型的 具体实现方式为:  The embodiment provides a method for automatically segmenting a handwritten entry in an electronic notebook, which is different from the method for automatically handwriting an entry in the electronic notebook according to the first embodiment: the type of the paper page is not known in advance. In this case, the specific implementation manner of determining the type of the paper page according to the paper page image is:
创建新的纸质页面的类型, 输入该未知的纸质页面的大小和格式。  Create a new type of paper page and enter the size and format of the unknown paper page.
即如果所拍摄的笔记本的纸质页面不属于 CamScanner等应用软件事先已知 的印刷了加粗或 /和加长的分行线、 或 /和分列线、 或 /和标题区域的纸质页面的类 型, 则在后续的步骤中先将该未知的纸质页面的类型添加到新创建的纸质页面的 类型中后, 再进行后续的处理。  That is, if the paper page of the notebook being photographed does not belong to a type of paper page printed with bold or/and lengthened branch lines, or/and a line of division, or/and a title area, which is known in advance by an application such as CamScanner. Then, in the subsequent steps, the type of the unknown paper page is first added to the type of the newly created paper page, and then the subsequent processing is performed.
本发明通过在对笔记本的纸质页面进行电子化的时候, 用预先保存的空白切 分模板辅助来获取并分割纸质页面上的用户手写的文字, 因为所述空白切分模板 由若干个文字块组成, 所以每个文字块均可以作为切分页面上字迹的单位, 从而 获得包含了内容完整的手写条目, 实现了电子化文档内容的自动切分和提取。  The invention acquires and divides the text handwritten by the user on the paper page by using the pre-saved blank segmentation template assist when electronically printing the paper page of the notebook, because the blank segmentation template is composed of several characters. The block is composed, so each block can be used as a unit of the handwriting on the page to obtain a handwritten entry containing the complete content, realizing the automatic segmentation and extraction of the electronic document content.
本发明通过采用上述技术方案, 大大的方便了用户对于文本图像的切分和归 类, 省去了逐条输入到电子设备中的麻烦。  By adopting the above technical solution, the invention greatly facilitates the user to segment and classify the text image, and the trouble of inputting into the electronic device one by one is omitted.
以上所述仅为本发明的较佳实施例而已, 并非用以限定本发明的实质技术内容范 围, 本发明的实质技术内容是广义的定义于申请的权利要求范围中, 任何他人完 成的技术实体或方法, 若是与申请的权利要求范围所定义的完全相同, 也或是一 种等效的变更, 均将被视为涵盖于该权利要求范围之中。 The above is only the preferred embodiment of the present invention, and is not intended to limit the technical content of the present invention. The technical content of the present invention is broadly defined in the scope of the claims of the application, and any technical entity or method completed by others, if it is exactly the same as defined in the scope of the claims of the application, or an equivalent Changes are considered to be covered by the claims.

Claims

权利要求书 Claim
1. 一种文本图像自动切分的方法, 其特征在于, 包括以下步骤:  A method for automatically segmenting text images, comprising the steps of:
获取文本图像; Get a text image;
对所述图像中文本的布局进行识别, 根据图像中文本的段落对图像进行划分, 将每个段落的 图像部分作为一个图像文字块; Identifying the layout of the text in the image, dividing the image according to the paragraph of the text in the image, and using the image portion of each paragraph as an image text block;
将各个图像文字块重新排列显示, 并为每个文字块设置一个可以由用户进行编辑操作的标 记。 Each image block is rearranged and displayed, and a label that can be edited by the user is set for each block.
2. 根据权利要求 1 所述的文本图像自动切分的方法, 其特征在于, 所述获取文本图像 的方式为对文本进行拍摄, 或者接收包含文本图像的文件。  2. The method of automatically segmenting a text image according to claim 1, wherein the method of acquiring a text image is to capture a text or receive a file containing a text image.
3. 根据权利要求 1 所述的文本图像自动切分的方法, 其特征在于, 在将各个图像文字 块重新排列显示之前, 用户手动对所述图像文字块的划分进行调整。  3. The method of automatically segmenting text images according to claim 1, wherein the user manually adjusts the division of the image block before the respective image blocks are rearranged and displayed.
4. 根据权利要求 1 所述的文本图像自动切分的方法, 其特征在于, 还包括在划分图像 文字块之后, 用户对图像文字块添加批注的步骤。  4. The method of automatically segmenting a text image according to claim 1, further comprising the step of the user adding an annotation to the image text block after dividing the image text block.
5. 根据权利要求 1 所述的文本图像自动切分的方法, 其特征在于, 还包括将各个图像 文字块分别添加至电子设备的待办事项中的步骤。  5. The method of automatically segmenting a text image according to claim 1, further comprising the step of separately adding each image text block to a to-do item of the electronic device.
6. 根据权利要求 1 所述的文本图像自动切分的方法, 其特征在于, 还包括对图像文字 块添加对应的时间信息的步骤, 并且根据时间信息结合对应的图像文字块发出提 示。  6. The method of automatically segmenting text images according to claim 1, further comprising the step of adding corresponding time information to the image text block, and issuing a prompt according to the time information in combination with the corresponding image text block.
7. 一种文本图像自动切分的方法, 其特征在于, 包括以下步骤:  A method for automatically segmenting text images, comprising the steps of:
获取文本图像; Get a text image;
对所述图像中文本的布局和文字进行识别, 根据图像中文本的段落对图像进行划分, 将每个 段落的图像部分中被识别后的文字作为一个文本文字块; Identifying the layout and text of the text in the image, dividing the image according to the paragraph of the text in the image, and using the recognized text in the image portion of each paragraph as a text block;
将各个文本文字块重新排列显示, 并为每个文字块设置一个可以由用户进行编辑操作的标 记。 Each text block is rearranged and a tag is set for each block that can be edited by the user.
8. 根据权利要求 7所述的文本图像自动切分的方法, 其特征在于, 所述获取文本图像 的方式为对文本进行拍摄, 或者接收包含文本图像的文件。  8. The method of automatically segmenting text images according to claim 7, wherein the method of acquiring a text image is to capture text or receive a file containing a text image.
9. 根据权利要求 7所述的文本图像自动切分的方法, 其特征在于,  9. The method of automatically segmenting text images according to claim 7, wherein:
对所述图像中文本的布局进行识别, 对所述图像中文本的文字进行识别, 根据图像中文本的 段落对图像进行划分, 然后将每个段落的图像部分中被识别后的文字作为一个文本文字块; 或者 对所述图像中文本的布局进行识别, 根据图像中文本的段落对图像进行划分, 然后对每个段 落的图像部分中的文字进行识别, 将每个段落的图像部分中被识别后的文字作为一个文本文 字块。 Identifying the layout of the text in the image, identifying the text of the text in the image, dividing the image according to the paragraph of the text in the image, and then using the recognized text in the image portion of each paragraph as a text Text block; or Identifying the layout of the text in the image, dividing the image according to the paragraph of the text in the image, and then identifying the text in the image portion of each paragraph, and using the recognized text in the image portion of each paragraph as A text block of text.
10.根据权利要求 7所述的文本图像自动切分的方法, 其特征在于, 在将各个文字块重 新排列显示之前, 用户手动对所述文本文字块的划分进行调整。  The method of automatically segmenting text images according to claim 7, wherein the user manually adjusts the division of the text blocks before the respective blocks are rearranged and displayed.
11.根据权利要求 7所述的文本图像自动切分的方法, 其特征在于, 还包括在划分文本 文字块之后, 用户对文本文字块添加批注的步骤。  The method of automatically segmenting a text image according to claim 7, further comprising the step of the user adding an annotation to the text block after dividing the text block.
12.根据权利要求 7所述的文本图像自动切分的方法, 其特征在于, 还包括在划分文本 文字块之后, 用户对文本文字块进行编辑的步骤。  The method of automatically segmenting a text image according to claim 7, further comprising the step of editing the text block by the user after dividing the text block.
13.根据权利要求 7所述的文本图像自动切分的方法, 其特征在于, 还包括将各个文本 文字块分别添加至电子设备的待办事项中的步骤。  The method of automatically segmenting text images according to claim 7, further comprising the step of separately adding respective text blocks to the to-do list of the electronic device.
14.根据权利要求 7所述的文本图像自动切分的方法, 其特征在于, 还包括对文本文字 块添加对应的时间信息的步骤, 并且根据时间信息结合对应的文本文字块发出提 示。  The method of automatically segmenting text images according to claim 7, further comprising the step of adding corresponding time information to the text block, and issuing a prompt according to the time information in combination with the corresponding text block.
15.—种实现如权利要求 1-14中任意一项所述的文本图像自动切分方法所采用的装置, 其特征在于, 基于包含计算机系统的电子设备, 包括:  The apparatus for implementing the automatic segmentation method of the text image according to any one of claims 1 to 14, wherein the electronic device based on the computer system comprises:
图像获取部件, 获取文本图像; An image acquisition component that acquires a text image;
图像布局识别部件, 对所述图像中文本的布局进行识别, 根据图像中文本的段落对图像进行 划分, 将每个段落的图像部分作为一个文字块或文字块; An image layout identifying component that recognizes a layout of the text in the image, divides the image according to a paragraph of the text in the image, and uses the image portion of each paragraph as a text block or a text block;
显示编辑部件, 将各个文字块重新排列显示, 并为每个文字块设置一个可以由用户进行编辑 操作的标记。 The editing component is displayed, the individual text blocks are rearranged, and a mark can be set for each text block that can be edited by the user.
16.根据权利要求 15 所述的文本图像自动切分装置, 其特征在于, 所述图像获取部件 至少包括拍摄部件以及文件接收部件中的至少一个, 所述拍摄部件对文本进行拍摄 以获取文本图像, 所述文件接收部件接收包含文本图像的文件以获取文本图像。 The text image automatic segmentation device according to claim 15, wherein the image acquisition unit includes at least one of a photographing unit and a document receiving unit, and the photographing unit photographs text to acquire a text image. The file receiving unit receives a file containing a text image to obtain a text image.
17.根据权利要求 15 所述的文本图像自动切分装置, 其特征在于, 所述图像布局识别 部件还包括文字识别部件, 对所述文本图像中的文字进行识别。 The text image automatic segmentation device according to claim 15, wherein the image layout recognition unit further includes a character recognition unit that recognizes characters in the text image.
18.根据权利要求 15 所述的文本图像自动切分装置, 其特征在于, 还包括调整部件, 在将各个文字块重新排列显示之前, 用户手动对所述文字块的划分进行调整。 The text image automatic segmentation apparatus according to claim 15, further comprising an adjustment unit that manually adjusts the division of the character block before rearranging the respective character blocks.
19.根据权利要求 15 所述的文本图像自动切分装置, 其特征在于, 还包括批注添加部 件, 在划分文本文字块之后, 用户对文本文字块添加批注。 The text image automatic segmentation device according to claim 15, further comprising an annotation adding unit, wherein after dividing the text block, the user adds an annotation to the text block.
20.根据权利要求 15 所述的文本图像自动切分装置, 其特征在于, 还包括文本编辑部 件, 在划分文本文字块之后, 用户对文本文字块进行编辑。 The text image automatic segmentation device according to claim 15, further comprising a text editing unit After dividing the text block, the user edits the block of text.
21.根据权利要求 15 所述的文本图像自动切分装置, 其特征在于, 还包括待办事项添 加部件, 将各个文字块分别添加至电子设备的待办事项中。  The text image automatic segmentation device according to claim 15, further comprising a to-do list adding component, wherein each of the text blocks is separately added to the to-do list of the electronic device.
22.根据权利要求 15 所述的文本图像自动切分装置, 其特征在于, 还包括时间信息添 加部件, 对文字块添加对应的时间信息, 并且根据时间信息结合对应的文本文字块 发出提示。  The text image automatic segmentation device according to claim 15, further comprising a time information adding unit, adding corresponding time information to the text block, and issuing a prompt according to the time information in combination with the corresponding text block.
23.—种自动切分电子化笔记本中手写条目的方法, 其特征在于, 所述自动切分电子化 笔记本中手写条目的方法包括:  23. A method of automatically segmenting handwritten entries in an electronic notebook, wherein the method of automatically segmenting handwritten entries in an electronic notebook comprises:
拍摄需要电子化的笔记本的纸质页面图像; Shooting a paper page image of an electronic notebook that requires electronic access;
通过图像中的直线检测方法确定所述纸质页面图像的四条边缘线, 并将四条边缘线所限定的 页面区域校正为方形区域; Determining four edge lines of the paper page image by a line detection method in the image, and correcting the page area defined by the four edge lines to a square area;
根据所述纸质页面图像确定所述纸质页面的类型, 获得预先保存的所述类型笔记本的纸质页 面空白切分模板, 所述空白切分模板由若干文字块组成; Determining a type of the paper page according to the paper page image, obtaining a paper page blank segmentation template of the notebook of the type saved in advance, wherein the blank segmentation template is composed of a plurality of text blocks;
确定所述正方形区域中用户手写笔迹所在的文字块, 以文字块为单位自动切分提取处于任意 一个文字块中的用户手写笔迹。 The text block in which the user's handwriting is located in the square area is determined, and the user's handwriting in any one of the text blocks is automatically segmented and extracted in units of text blocks.
24.根据权利要求 23 所述的自动切分电子化笔记本中手写条目的方法, 其特征在于: 所述纸质页面的类型由该纸质页面的大小和格式决定; 所述纸质页面的格式包括纸 质页面包括的文字块的数目、 大小、 间隔。  24. The method of automatically segmenting handwritten entries in an electronic notebook according to claim 23, wherein: the type of the paper page is determined by a size and format of the paper page; and the format of the paper page Includes the number, size, and spacing of text blocks included in the paper page.
25.根据权利要求 23 所述的自动切分电子化笔记本中手写条目的方法, 其特征在于: 所述文字块能够与相邻的文字块合并, 以合并后的文字块为单位自动切分提取处于 任意一个文字块中的用户手写笔迹。  The method of automatically segmenting handwritten entries in an electronic notebook according to claim 23, wherein: the text block can be merged with adjacent text blocks, and the combined text blocks are automatically segmented and extracted. User handwriting in any block of text.
26.根据权利要求 23 所述的自动切分电子化笔记本中手写条目的方法, 其特征在于: 在所述纸质页面的类型为预先已知的情况下, 根据所述纸质页面图像确定所述纸质 页面的类型的具体实现方式为: 人工指定所述纸质页面的类型。  26. The method of automatically segmenting handwritten entries in an electronic notebook according to claim 23, wherein: in the case where the type of the paper page is known in advance, determining the basis based on the paper page image The specific implementation of the type of paper page is: Manually specify the type of the paper page.
27.根据权利要求 23 所述的自动切分电子化笔记本中手写条目的方法, 其特征在于: 在所述纸质页面的类型为预先已知的情况下, 根据所述纸质页面图像确定所述纸质 页面的类型的具体实现方式为:  27. The method of automatically segmenting handwritten entries in an electronic notebook according to claim 23, wherein: in the case where the type of the paper page is known in advance, determining the basis based on the paper page image The specific implementation of the type of paper page is:
在所述纸质页面上的固定位置处印刷有一类型标记; Printing a type of mark at a fixed position on the paper page;
检测所述纸质页面图像上的类型标记, 将该检测到的类型标记与预先已知的类型标记进行一 一比较, 找出所述纸质页面所属的类型。 A type mark on the paper page image is detected, and the detected type mark is compared with a previously known type mark to find out the type to which the paper page belongs.
28.根据权利要求 23 所述的自动切分电子化笔记本中手写条目的方法, 其特征在于: 在所述纸质页面的类型为预先不知的情况下, 根据所述纸质页面图像确定所述纸质 页面的类型的具体实现方式为: 28. The method of automatically segmenting handwritten entries in an electronic notebook according to claim 23, wherein: In a case where the type of the paper page is unknown in advance, a specific implementation manner of determining the type of the paper page according to the paper page image is:
创建新的纸质页面的类型, 输入该未知的纸质页面的大小和格式。 Create a new type of paper page and enter the size and format of the unknown paper page.
PCT/CN2013/088494 2012-12-05 2013-12-04 Text image automatic dividing method and device, method for automatically dividing handwriting entries WO2014086287A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201210517167.6 2012-12-05
CN201210517167.6A CN103020619B (en) 2012-12-05 2012-12-05 A kind of method of handwritten entries in automatic segmentation electronization notebook

Publications (1)

Publication Number Publication Date
WO2014086287A1 true WO2014086287A1 (en) 2014-06-12

Family

ID=47969210

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/088494 WO2014086287A1 (en) 2012-12-05 2013-12-04 Text image automatic dividing method and device, method for automatically dividing handwriting entries

Country Status (2)

Country Link
CN (1) CN103020619B (en)
WO (1) WO2014086287A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110503103A (en) * 2019-08-28 2019-11-26 上海海事大学 A kind of character cutting method in line of text based on full convolutional neural networks
CN111339338A (en) * 2020-02-29 2020-06-26 西安理工大学 Text picture matching recommendation method based on deep learning

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102938063B (en) * 2012-12-05 2016-02-10 上海合合信息科技发展有限公司 A kind of professional notebook of convenient electronization and electronic method thereof
CN103034842A (en) * 2012-12-05 2013-04-10 上海合合信息科技发展有限公司 Professional notebook computer facilitating electronization and electronic thumbnail photo display method thereof
CN103020619B (en) * 2012-12-05 2016-04-20 上海合合信息科技发展有限公司 A kind of method of handwritten entries in automatic segmentation electronization notebook
US10255253B2 (en) 2013-08-07 2019-04-09 Microsoft Technology Licensing, Llc Augmenting and presenting captured data
WO2015018244A1 (en) 2013-08-07 2015-02-12 Microsoft Corporation Augmenting and presenting captured data
CN105184329A (en) * 2015-08-27 2015-12-23 鲁东大学 Cloud-platform-based off-line handwriting recognition method
CN106504141A (en) * 2015-09-08 2017-03-15 迈吉克股份有限公司 A kind of writing marking system and its methods of marking
CN105373791B (en) * 2015-11-12 2018-12-14 中国建设银行股份有限公司 Information processing method and information processing unit
WO2018010090A1 (en) * 2016-07-12 2018-01-18 程抒一 Method and device for classification and storage with respect to paper notebook
CN106649629B (en) * 2016-12-02 2020-04-10 华中师范大学 System for associating books with electronic resources
CN106919546A (en) * 2017-02-27 2017-07-04 宇龙计算机通信科技(深圳)有限公司 A kind of document auxiliary establishing method and system
JP6729486B2 (en) * 2017-05-15 2020-07-22 京セラドキュメントソリューションズ株式会社 Information processing apparatus, information processing program, and information processing method
WO2020237439A1 (en) * 2019-05-24 2020-12-03 深圳市柔宇科技有限公司 Handwriting recognition method, electronic device and storage medium
CN110443234A (en) * 2019-06-29 2019-11-12 万翼科技有限公司 Data processing method and Related product

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1487476A (en) * 2002-10-04 2004-04-07 ��ʿͨ��ʽ���� Image generating apparatus and method
CN1752991A (en) * 2004-09-24 2006-03-29 富士施乐株式会社 Apparatus, method and program for recognizing characters
CN1851617A (en) * 2005-04-22 2006-10-25 英华达(上海)电子有限公司 Converting device and method for mobile device making OCR convenient and input to existing editor
CN101339618A (en) * 2007-07-06 2009-01-07 上海思必得通讯技术有限公司 Mobile phones name card recognition device
CN102177520A (en) * 2008-08-13 2011-09-07 谷歌公司 Segmenting printed media pages into articles
CN103020619A (en) * 2012-12-05 2013-04-03 上海合合信息科技发展有限公司 Method for automatically dividing handwritten clauses in electronic notebook

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5822454A (en) * 1995-04-10 1998-10-13 Rebus Technology, Inc. System and method for automatic page registration and automatic zone detection during forms processing
CN101976114B (en) * 2010-09-29 2012-07-04 长安大学 System and method for realizing information interaction between computer and pen and paper based on camera
CN102201053B (en) * 2010-12-10 2013-07-24 上海合合信息科技发展有限公司 Method for cutting edge of text image
CN102081732B (en) * 2010-12-29 2013-06-05 方正国际软件有限公司 Method and system for recognizing format template
KR20120106291A (en) * 2011-03-18 2012-09-26 강승인 Blank note

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1487476A (en) * 2002-10-04 2004-04-07 ��ʿͨ��ʽ���� Image generating apparatus and method
CN1752991A (en) * 2004-09-24 2006-03-29 富士施乐株式会社 Apparatus, method and program for recognizing characters
CN1851617A (en) * 2005-04-22 2006-10-25 英华达(上海)电子有限公司 Converting device and method for mobile device making OCR convenient and input to existing editor
CN101339618A (en) * 2007-07-06 2009-01-07 上海思必得通讯技术有限公司 Mobile phones name card recognition device
CN102177520A (en) * 2008-08-13 2011-09-07 谷歌公司 Segmenting printed media pages into articles
CN103020619A (en) * 2012-12-05 2013-04-03 上海合合信息科技发展有限公司 Method for automatically dividing handwritten clauses in electronic notebook

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110503103A (en) * 2019-08-28 2019-11-26 上海海事大学 A kind of character cutting method in line of text based on full convolutional neural networks
CN110503103B (en) * 2019-08-28 2023-04-07 上海海事大学 Character segmentation method in text line based on full convolution neural network
CN111339338A (en) * 2020-02-29 2020-06-26 西安理工大学 Text picture matching recommendation method based on deep learning
CN111339338B (en) * 2020-02-29 2023-03-07 西安理工大学 Text picture matching recommendation method based on deep learning

Also Published As

Publication number Publication date
CN103020619A (en) 2013-04-03
CN103020619B (en) 2016-04-20

Similar Documents

Publication Publication Date Title
WO2014086287A1 (en) Text image automatic dividing method and device, method for automatically dividing handwriting entries
AU2017302250B2 (en) Optical character recognition in structured documents
WO2014086277A1 (en) Professional notebook convenient for electronization and method for automatically identifying page number thereof
US10296170B2 (en) Electronic apparatus and method for managing content
US20140348394A1 (en) Photograph digitization through the use of video photography and computer vision technology
US20070177183A1 (en) Generation Of Documents From Images
US20140325348A1 (en) Conversion of a document of captured images into a format for optimized display on a mobile device
CN102982160A (en) Professional notebook convenient for electronization and automatic classification method of electronic documents of professional notebook
WO2014086272A1 (en) Professional notebook convenient for electronization and method for adding same into electronic calendar
WO2014082551A1 (en) Method and device for obtaining contents in paper notebook
US11972025B2 (en) Stored image privacy violation detection method and system
US20100125780A1 (en) Electronic device with annotation function and method thereof
US20220345498A1 (en) Shared image sanitization method and system
KR20200100027A (en) Method and device for text-based image retrieval
WO2014086266A1 (en) Professional notebook convenient for electronization and method for displaying electronic thumbnail thereof
KR101841641B1 (en) Automatic layout photo album Processing System using history
KR101477642B1 (en) Flat board printer
RU2587406C2 (en) Method of processing visual object and electronic device used therein
US20080282138A1 (en) Methods and systems for multimedia object association to digital paper spatial diagrams
US20150089335A1 (en) Smart processing of an electronic document
JP4808128B2 (en) Image processing system, Web server device, image processing method, and program
KR101641910B1 (en) Automatic layout photo album Processing System
CN104361101A (en) Video file classifying method and system
WO2018010090A1 (en) Method and device for classification and storage with respect to paper notebook
TW201901490A (en) Method of searching an image file in a computer system, related image file searching device, and related computer system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13860390

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13860390

Country of ref document: EP

Kind code of ref document: A1