WO2014086287A1

WO2014086287A1 - Text image automatic dividing method and device, method for automatically dividing handwriting entries

Info

Publication number: WO2014086287A1
Application number: PCT/CN2013/088494
Authority: WO
Inventors: 陈青山; 罗希平
Original assignee: 上海合合信息科技发展有限公司
Priority date: 2012-12-05
Filing date: 2013-12-04
Publication date: 2014-06-12
Also published as: CN103020619A; CN103020619B

Abstract

Disclosed is a text image automatic dividing method, which comprises the following steps: obtaining a text image; identifying layout of a text in the image, dividing the image according to paragraphs of the text in the image, and using an image part of each paragraph as an image character block; and re-arranging and displaying the image character blocks, and setting, for each character block, a mark that can be edited by a user. Further disclosed is a device used by the text image automatic dividing method. Further disclosed is a method for automatically dividing handwriting entries in an electronic notebook. In the present invention, by using the technical solutions, the user can divide and classify text images conveniently, thereby eliminating the trouble of inputting to an electronic device entry by entry.

Description

Text image automatic segmentation method and device, method for automatically segmenting handwritten items

The invention relates to an image processing method, in particular to a text image automatic segmentation method. The invention further relates to an image processing apparatus, and more particularly to a text image automatic segmentation apparatus. The invention further relates to a method of automatically segmenting handwritten entries in an electronic notebook. Background technique

In daily life, people often need to take paper documents, save them in JPEG format, or generate PDF documents, so that paper documents can be electronicized and managed. Smartphones are one of the commonly used tools for electronically documenting paper documents. Because the camera usually has a camera on the smartphone, the camera on the mobile phone can take a paper document, and the captured electronic document can be processed into a JPEG format or a PDF document. Applications with these features have also become more popular, such as the CamS _Canne r application in the Apple App Store and the Google App Store. These applications can automatically monitor the four sides of the captured document from the captured image, use this as a reference to cut off the background outside the document area in the image, and perform correction and image enhancement on the document area to obtain a scanner similar to the one used. The effect of scanning a clean and clean electronic document is saved and managed in a user-specified format.

Paper has long been used to make various records, such as meeting minutes, memo records, and so on. In actual use, users often need to manually record the next item on the paper. For example, a user can write down the weekend's possible activity options by dividing it into three lines on the notebook page: 1. Shopping, 2, watching movies, 3 After going to the park; after taking the image of this paper and electronically, the user made a decision in these three options, choose 2, watch the movie, he needs to save this decision to the to-do list and need to be in the electronic device. It is very inconvenient to enter text once again. Ideally, the user simply clicks "2, watch movie" in the electronic document of the paper displayed on the electronic device, and the area where the handwriting is located automatically divides the image area containing the "2, watching movie" handwriting. Come out and join the to-do list. Summary of the invention

The technical problem to be solved by the present invention is to provide a text task automatic segmentation method, and the like The text task automatic segmentation device used in the automatic segmentation method of the task can conveniently help the user to edit and process the text tasks recorded on the paper.

In order to solve the above technical problem, the technical solution of the text image automatic segmentation method of the present invention comprises the following steps:

Get a text image;

Identifying the layout of the text in the image, dividing the image according to the paragraph of the text in the image, and using the image portion of each paragraph as an image text block;

Each image block is rearranged and a tag is set for each block that can be edited by the user.

The invention further discloses a text image automatic segmentation method, which technical solution comprises the following steps: acquiring a text image;

Identifying the layout and text of the text in the image, dividing the image according to the paragraph of the text in the image, and using the recognized text in the image portion of each paragraph as a text block;

The individual text blocks are rearranged and a tag is set for each block that can be edited by the user.

The invention also discloses an automatic text segmentation device for the text image automatic segmentation method, which is based on an electronic device including a computer system, and includes:

An image acquisition component that acquires a text image;

An image layout identifying component that recognizes a layout of the text in the image, divides the image according to a paragraph of the text in the image, and uses the image portion of each paragraph as a text block or a text block;

The editing component is displayed, the individual text blocks are rearranged, and a mark can be set for each text block that can be edited by the user.

The invention further discloses a method for automatically segmenting handwritten entries in an electronic notebook, and the technical solution thereof is: the method for automatically segmenting handwritten entries in an electronic notebook comprises:

Shooting a paper page image of an electronic notebook that requires electronic access;

Determining four edge lines of the paper page image by a line detection method in the image, and correcting the page area defined by the four edge lines to a square area;

Determining a type of the paper page according to the paper page image, obtaining a paper page blank segmentation template of the notebook of the type saved in advance, wherein the blank segmentation template is composed of a plurality of text blocks; The text block in which the user's handwriting is located in the square area is determined, and the user's handwriting in any one of the text blocks is automatically segmented and extracted in units of text blocks.

The invention greatly facilitates the user's segmentation and classification of the text image, and the trouble of inputting into the electronic device one by one is omitted. DRAWINGS

The present invention will be further described in detail below with reference to the accompanying drawings and embodiments:

1 to FIG. 3 are schematic diagrams showing an embodiment of a text image automatic segmentation method according to the present invention;

4 and FIG. 5 are schematic diagrams showing another embodiment of a text image automatic segmentation method according to the present invention; FIG. 6 to FIG. 8 are schematic diagrams showing still another embodiment of a text image automatic segmentation method according to the present invention; FIG. FIG. 11 is a schematic diagram of a text image automatic segmentation apparatus according to another embodiment of the present invention; FIG.

12 is a schematic flow chart of a method for automatically handwriting an entry in an electronic notebook according to the present invention. The reference numerals in the figure are: 1. a touch screen and a button; 2. a camera. detailed description

In work and life, people often record what they need to do with paper. As shown in Figures 1 and 6, they may be written on a piece of paper at random, or they may receive documents from others. Things. According to people's writing habits, things that need to be done are generally recorded in segments, one to-do list is recorded in each segment, and there is a clear gap between segments. The method for automatically segmenting text images according to the present invention is implemented based on an electronic device such as a smart phone or a tablet computer, and the user can directly capture the paper to obtain a text image, or scan by other devices or others, and then Send a text image file. After acquiring the text image, identifying the layout of the text in the image, dividing the image according to the paragraph of the text in the image, and using the image portion of each paragraph as an image text block, then each image text block will be Correspond to a to-do list. Each image block is rearranged and displayed, and a mark that can be edited by the user is set for each block. The tag can be a check box. When the user completes a to-do list, the check box can be checked for marking, as shown in FIG. 2, FIG. 3, FIG. 7, and FIG. Sometimes, because the handwritten format is not standardized, or other circumstances, the electronic device does not accurately identify the layout of the text in the text image. Therefore, the user can manually adjust the division of the image block before the respective image blocks are rearranged and displayed. As shown in FIGS. 4 and 9, each image character block is framed, and the user can perform operations such as division and merging of the framed image character block.

After dividing the image text block, the user can also annotate the image text block.

Currently, electronic devices such as smartphones and tablets have a to-do list addition function. The present invention can also add individual image text blocks to the to-do items of the electronic device.

In addition, it is also possible to add corresponding time information to the image text block, and issue a prompt according to the time information in combination with the corresponding image text block, which can also be combined with the to-do item function currently existing in the electronic device. As shown in Figure 8.

In the above embodiment, the image text block is directly listed for the user to edit, and the object to be edited and processed is still an image. The invention also discloses another method for automatic segmentation of text images, comprising the following steps: acquiring a text image;

In this embodiment, the step of text recognition is added, and finally the text block is listed for user editing, which further facilitates the user's use.

The manner of acquiring the text image is to take a picture of the text or receive a file containing the text image. The specific recognition process can be in the following two ways: One is to identify the layout of the text in the image, identify the text of the text in the image, divide the image according to the paragraph of the text in the image, and then divide each The recognized text in the image portion of the paragraph is used as a text block, that is, the character recognition is performed first, and then the image is divided. The other is to identify the layout of the text in the image, divide the image according to the paragraph of the text in the image, and then identify the text in the image portion of each paragraph, and identify the image portion of each paragraph. The latter text is used as a text block, that is, the image is divided first, and then the characters in the divided image portion are identified.

In order to eliminate errors generated by the electronic device during layout recognition, the user can manually adjust the division of the text block before rearranging the respective blocks. After dividing the image text block, the user can also annotate the image text block.

Since the text in the text image has been recognized in this embodiment, the user can also edit the text in the text block after the recognition.

The embodiment of Figures 4 and 5 and the embodiment of Figures 9 and 10 are shown. The layout of the image in Fig. 1 is identified. As shown in Fig. 4, the text recognition block is obtained through the step of text recognition, and the text text block is imported into the to-do list and listed to the user, as shown in FIG. The layout of the image in Fig. 6 is identified, as shown in Fig. 9, after the text recognition step, a text block is obtained, and the text block is imported into the to-do list and listed to the user, as shown in FIG.

In addition, it is also possible to add corresponding time information to the image text block, and issue a prompt according to the time information in combination with the corresponding image text block, which can also be combined with the to-do item function currently existing in the electronic device, as shown in FIG. .

The invention further discloses a device for implementing the above-mentioned text image automatic segmentation method. As shown in FIG. 11, the electronic device based on the computer system, such as a smart phone, a tablet computer, etc., includes:

An image acquisition component that acquires a text image;

The image acquisition section includes at least one of a photographing section that photographs text to acquire a text image, and a document receiving section that receives a file containing the text image to acquire a text image. A camera 2 is provided on the smartphone shown in FIG.

The image layout identifying component further includes a text recognition component that recognizes text in the text image.

Also included is an adjustment component that manually adjusts the division of the text block before rearranging the respective text blocks.

It also includes annotations to add parts, and after dividing the text block, the user annotates the text block. It also includes a text editing component that edits the text block after the text block is divided. It also includes a to-do list adding component that adds each text block to the to-do list of the electronic device. The method further includes a time information adding component, adding corresponding time information to the text block, and issuing a prompt according to the time information in combination with the corresponding text text block.

The invention will be described in detail below in conjunction with other embodiments and the accompanying drawings.

Embodiment 1

The embodiment provides a method for automatically segmenting handwritten entries in an electronic notebook. As shown in FIG. 12, the method for automatically handwriting entries in an electronic notebook includes:

Take a picture of a paper page that requires an electronic notebook. In this embodiment, the paper page of the notebook that needs to be electronicized may be of any type, for example, the paper page is printed with a classification mark area, a page number area, a title area, a branch line, or/and a line, and the like. It can also be a combination of any of the above.

The four edge lines of the paper page image are determined by a line detection method in the image, and the page area defined by the four edge lines is corrected to a square area. Specifically, a line representing the outer edges of the four pages in the paper page image is acquired by a line detection method in the image, and a background area outside the range defined by the outer edge lines of the four pages in the image is cut out, and the outer edges of the four pages are excluded. The paper page image is corrected based on the straight line, and the page area defined by the outer edge lines of the four pages is corrected to a rectangular area.

Determining the type of the paper page according to the paper page image, obtaining a paper page blank segmentation template of the type notebook in advance, the blank segment template being composed of a plurality of text blocks. In this embodiment, the type of the paper page is determined by the size and format of the paper page; the format of the paper page includes the number of text blocks included in the paper page, the size of the text block, and adjacent text. The spacing between the blocks. That is, the paper page may be composed of block regions of any shape, and each block region is a block of text. This block of text is exactly the same as the user's handwriting on the paper page.

The paper page image of the notebook photographed in the present invention belongs to a page type that has been previously saved by an application software such as the existing CamScanner. Therefore, the blank cut template of the paper page of the type saved in advance can be used to obtain the handwritten handwriting of the user. The image area (that is, the area where a block of text or a plurality of merged blocks of text) is located, obviously the accuracy is greatly improved.

The text block in which the user's handwriting is located in the square area is determined, and the user's handwriting in any one of the text blocks is automatically segmented and extracted in units of text blocks. The text block can also be merged with the adjacent text block, that is, the text segment can be automatically segmented and extracted in any one of the characters. User handwriting in the block. In the corrected notebook paper page image, referring to the pre-saved blank cut template of the notebook paper page, determining the position of the user's handwriting in the blank page template in the notebook page, and handwriting the user The handwriting is divided into blocks of text representing different lines of text. By the method of the present invention, the user can manually merge adjacent regions representing a plurality of text blocks constituting the complete meaning into one by a simple operation. These cut-outs represent the contents of the text block that constitutes the complete meaning. They can be used to add to the list of to-do items in the electronic device. You can also use the existing handwriting recognition technology to identify the text, and save it. The user has trouble entering text manually on the electronic device.

The invention obtains and divides the handwritten text area of the user by using the text block assist in the pre-saved blank segmentation template when the notebook page is electronicized, and obtains an image block (also called a text) containing the complete handwritten entry. Block), which facilitates the electronic partitioning of paper pages and the use and management of electronic documents. That is, the present invention acquires and divides the text handwritten by the user on the paper page by using the pre-saved blank cut template assist when electronically copying the paper page of the notebook, because the blank cut template It consists of several blocks of text, so each block can be used as a unit of handwriting on the page to obtain a handwritten entry containing complete content, which realizes automatic segmentation and extraction of electronic document content.

Embodiment 2

The embodiment provides a method for automatically segmenting a handwritten entry in an electronic notebook, which is different from the method for automatically handwriting an entry in the electronic notebook according to the first embodiment: the paper page is known in advance. The specific implementation manner of determining the type of the paper page according to the paper page image is: manually specifying the type of the paper page; that is, manually specifying the image before the image is taken, or before the image is processed after the image is taken. The type of notebook paper page to which it belongs, such as one of a series of notebook page types that are pre-stored in applications such as camScanner.

Embodiment 3

The embodiment provides a method for automatically segmenting handwritten entries in an electronic notebook, which is different from the method for handwriting entries in the automatic segmentation electronic notebooks described in Embodiments 1 and 2 in that: the paper is known in advance. The type of the page, the specific implementation manner of determining the type of the paper page according to the paper page image is:

Printing a type mark on a fixed position on the paper page; the type mark may be a text A word, symbol, graphic, or a combination of any two or three items.

A type mark on the paper page image is detected, and the detected type mark is compared with a previously known type mark to find out the type to which the paper page belongs. Printing a type mark on a fixed position on the paper page; that is, printing a pre-designed mark (ie, type mark) on a specified position of each paper page of the notebook in advance, and acquiring the notebook in the photographing After the image of the paper page, the four outer edges of the paper page of the notebook are detected in the image, and the approximate position of the mark is determined in the image of the paper page with reference to the four outer edges, thereby realizing the mark The detection in the image then compares the detected mark with the pre-stored mark of the paper page representing a plurality of different types of notebooks to find out the type of the paper page of the photographed notebook. The detected mark is compared with the pre-stored mark representing a plurality of different types of notebook paper pages to find out the type of the paper page of the photographed notebook, which involves handwriting recognition, text recognition, Mature techniques in the art such as image matching are not described herein.

Embodiment 4

The embodiment provides a method for automatically segmenting a handwritten entry in an electronic notebook, which is different from the method for automatically handwriting an entry in the electronic notebook according to the first embodiment: the type of the paper page is not known in advance. In this case, the specific implementation manner of determining the type of the paper page according to the paper page image is:

Create a new type of paper page and enter the size and format of the unknown paper page.

That is, if the paper page of the notebook being photographed does not belong to a type of paper page printed with bold or/and lengthened branch lines, or/and a line of division, or/and a title area, which is known in advance by an application such as CamScanner. Then, in the subsequent steps, the type of the unknown paper page is first added to the type of the newly created paper page, and then the subsequent processing is performed.

The invention acquires and divides the text handwritten by the user on the paper page by using the pre-saved blank segmentation template assist when electronically printing the paper page of the notebook, because the blank segmentation template is composed of several characters. The block is composed, so each block can be used as a unit of the handwriting on the page to obtain a handwritten entry containing the complete content, realizing the automatic segmentation and extraction of the electronic document content.

By adopting the above technical solution, the invention greatly facilitates the user to segment and classify the text image, and the trouble of inputting into the electronic device one by one is omitted.

The above is only the preferred embodiment of the present invention, and is not intended to limit the technical content of the present invention. The technical content of the present invention is broadly defined in the scope of the claims of the application, and any technical entity or method completed by others, if it is exactly the same as defined in the scope of the claims of the application, or an equivalent Changes are considered to be covered by the claims.

Claims

Claim

A method for automatically segmenting text images, comprising the steps of:

Get a text image;

Each image block is rearranged and displayed, and a label that can be edited by the user is set for each block.

2. The method of automatically segmenting a text image according to claim 1, wherein the method of acquiring a text image is to capture a text or receive a file containing a text image.

3. The method of automatically segmenting text images according to claim 1, wherein the user manually adjusts the division of the image block before the respective image blocks are rearranged and displayed.

4. The method of automatically segmenting a text image according to claim 1, further comprising the step of the user adding an annotation to the image text block after dividing the image text block.

5. The method of automatically segmenting a text image according to claim 1, further comprising the step of separately adding each image text block to a to-do item of the electronic device.

6. The method of automatically segmenting text images according to claim 1, further comprising the step of adding corresponding time information to the image text block, and issuing a prompt according to the time information in combination with the corresponding image text block.

A method for automatically segmenting text images, comprising the steps of:

Get a text image;

Each text block is rearranged and a tag is set for each block that can be edited by the user.

8. The method of automatically segmenting text images according to claim 7, wherein the method of acquiring a text image is to capture text or receive a file containing a text image.

9. The method of automatically segmenting text images according to claim 7, wherein:

Identifying the layout of the text in the image, identifying the text of the text in the image, dividing the image according to the paragraph of the text in the image, and then using the recognized text in the image portion of each paragraph as a text Text block; or Identifying the layout of the text in the image, dividing the image according to the paragraph of the text in the image, and then identifying the text in the image portion of each paragraph, and using the recognized text in the image portion of each paragraph as A text block of text.

The method of automatically segmenting text images according to claim 7, wherein the user manually adjusts the division of the text blocks before the respective blocks are rearranged and displayed.

The method of automatically segmenting a text image according to claim 7, further comprising the step of the user adding an annotation to the text block after dividing the text block.

The method of automatically segmenting a text image according to claim 7, further comprising the step of editing the text block by the user after dividing the text block.

The method of automatically segmenting text images according to claim 7, further comprising the step of separately adding respective text blocks to the to-do list of the electronic device.

The method of automatically segmenting text images according to claim 7, further comprising the step of adding corresponding time information to the text block, and issuing a prompt according to the time information in combination with the corresponding text block.

The apparatus for implementing the automatic segmentation method of the text image according to any one of claims 1 to 14, wherein the electronic device based on the computer system comprises:

An image acquisition component that acquires a text image;

The text image automatic segmentation device according to claim 15, wherein the image acquisition unit includes at least one of a photographing unit and a document receiving unit, and the photographing unit photographs text to acquire a text image. The file receiving unit receives a file containing a text image to obtain a text image.

The text image automatic segmentation device according to claim 15, wherein the image layout recognition unit further includes a character recognition unit that recognizes characters in the text image.

The text image automatic segmentation apparatus according to claim 15, further comprising an adjustment unit that manually adjusts the division of the character block before rearranging the respective character blocks.

The text image automatic segmentation device according to claim 15, further comprising an annotation adding unit, wherein after dividing the text block, the user adds an annotation to the text block.

The text image automatic segmentation device according to claim 15, further comprising a text editing unit After dividing the text block, the user edits the block of text.

The text image automatic segmentation device according to claim 15, further comprising a to-do list adding component, wherein each of the text blocks is separately added to the to-do list of the electronic device.

The text image automatic segmentation device according to claim 15, further comprising a time information adding unit, adding corresponding time information to the text block, and issuing a prompt according to the time information in combination with the corresponding text block.

23. A method of automatically segmenting handwritten entries in an electronic notebook, wherein the method of automatically segmenting handwritten entries in an electronic notebook comprises:

Determining a type of the paper page according to the paper page image, obtaining a paper page blank segmentation template of the notebook of the type saved in advance, wherein the blank segmentation template is composed of a plurality of text blocks;

The text block in which the user's handwriting is located in the square area is determined, and the user's handwriting in any one of the text blocks is automatically segmented and extracted in units of text blocks.

24. The method of automatically segmenting handwritten entries in an electronic notebook according to claim 23, wherein: the type of the paper page is determined by a size and format of the paper page; and the format of the paper page Includes the number, size, and spacing of text blocks included in the paper page.

The method of automatically segmenting handwritten entries in an electronic notebook according to claim 23, wherein: the text block can be merged with adjacent text blocks, and the combined text blocks are automatically segmented and extracted. User handwriting in any block of text.

26. The method of automatically segmenting handwritten entries in an electronic notebook according to claim 23, wherein: in the case where the type of the paper page is known in advance, determining the basis based on the paper page image The specific implementation of the type of paper page is: Manually specify the type of the paper page.

27. The method of automatically segmenting handwritten entries in an electronic notebook according to claim 23, wherein: in the case where the type of the paper page is known in advance, determining the basis based on the paper page image The specific implementation of the type of paper page is:

Printing a type of mark at a fixed position on the paper page;

A type mark on the paper page image is detected, and the detected type mark is compared with a previously known type mark to find out the type to which the paper page belongs.

28. The method of automatically segmenting handwritten entries in an electronic notebook according to claim 23, wherein: In a case where the type of the paper page is unknown in advance, a specific implementation manner of determining the type of the paper page according to the paper page image is: