WO2009063329A1 - A color-based computerized method for automatic document indexing - Google Patents

A color-based computerized method for automatic document indexing Download PDF

Info

Publication number
WO2009063329A1
WO2009063329A1 PCT/IB2008/003857 IB2008003857W WO2009063329A1 WO 2009063329 A1 WO2009063329 A1 WO 2009063329A1 IB 2008003857 W IB2008003857 W IB 2008003857W WO 2009063329 A1 WO2009063329 A1 WO 2009063329A1
Authority
WO
WIPO (PCT)
Prior art keywords
document
tag
label
text
applying
Prior art date
Application number
PCT/IB2008/003857
Other languages
French (fr)
Inventor
Tamar Giloh
Eran Giloh
Original Assignee
Tamar Giloh
Eran Giloh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tamar Giloh, Eran Giloh filed Critical Tamar Giloh
Publication of WO2009063329A1 publication Critical patent/WO2009063329A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/1444Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields
    • G06V30/1448Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields based on markings or identifiers characterising the document or the area
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • G06V30/18105Extraction of features or characteristics of the image related to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • the application relates to a color-based method for automatic electronic document indexing.
  • the present application provides a method to categorize and tag documents and data using a combination of color and text.
  • the method includes associating one or more colors with one or more indexing criteria to create a tag (e.g. 'Customer name' or 'date of document' etc), applying the one or more colors to text on a document, identifying the colored portions of the document, detecting text on the colored portions, comparing the detected text to a label (e.g. a specific name of customer) associated with the tag, if the detected text matches a label associated with the tag, applying the tag and the label to the document, and filing the tagged and labeled document in a database or a specified folder.
  • a label e.g. a specific name of customer
  • a computer readable medium is also described.
  • the medium has instructions stored thereon executable by a process for carrying out the method including associating one or more colors with one or more indexing criteria to create a tag, applying the one or more colors to text on a document, identifying the colored portions of the document, detecting text on the colored portions, comparing the detected text to a label associated with the tag, if the detected text matches a label associated with the tag, applying the tag and the label to the document, and filing the tagged and labeled document in a database or a specified folder.
  • Fig. 1 shows a flow chart of the method of the present application
  • Fig. 2 shows another flow chart of the method of the present application.
  • a software-based method for computerized indexing is provided for the automatic indexing of electronic documents.
  • the indexing method includes detecting colors and text in a document, tagging the document based on the colors, and filing the document into databases based on the detected text associated with the tags applied.
  • Text/color combinations may be obtained in different ways, including, but not limited to: the text's background being color marked/highlighted manually, the text letters being specific colors or combination of colors, the text being defined by colored circles, lines or patterns, and possibly including a combination of colors (e.g. a blue line between green lines), or any combination of the above.
  • the color may be applied by marking text on a document, either by using a colored marker or other dedicated tool on the document, or on an electronic document by using the highlighting feature or other dedicated marking features. [00011] In one embodiment, only the color may be used to define specific texts for tagging. In such a case, the color doesn't have a meaning other than for creating general tags.
  • the colors may have a meaning according to a predetermined code.
  • the color red may represent the 'customer name' title in a database. Any text having red background or other red marking will be categorized under 'customer name' in the tag database. The color yellow in the text background or other yellow marking may represent the document 'date' etc.
  • a user creates a color/indexing criterion category or relationship. For example, when the criterion is a customer name, the user can mark the customer name in the text of the document with the associated color, such as red. This color/indexing criterion pair is then created as a tag and saved in either the document under the file properties, or in a separate database.
  • tags can also be created and saved, such as yellow being associated with the date of the document or blue being associated with a case number.
  • Tags may also include labels associated with the tags.
  • labels associated with the customer name/red tag may include names of individual customers, such as Proctor &Gamble, Johnson & Johnson, and Kimberly Clark.
  • Labels associated with the date of the document may be specific dates, such as 11/11/09.
  • one tag may represent customer names plus the color red.
  • the labels associated with this tag may include the specific names of customers (such as Proctor &Gamble, Johnson & Johnson, and Kimberly Clark). Another tag may represent the date plus the color yellow. Labels associated with this tag may include the specific dates (such as 11/11/09).
  • a document when a document is received, either by email or in paper form and scanned, it is marked according to the color/indexing criterion tag. For example, each mention of the customer name on the document may be highlighted in red.
  • the color marking may be performed with a marker or highlighter or other dedicated tool, on a paper document, or with the highlighting tool or other dedicated feature on an electronic document.
  • OCR Optical Character Recognition
  • any known method may be performed to detect text. When text is detected, that text is compared to a list of labels associated with the tag.
  • the tag and label are applied to the document, and the document is filed in a database or a specified folder. If the detected text is similar to a label associated with the tag but does not match a label exactly, such as in the case of a typo, two options may apply. In one option, the system applies document analysis methods such as text recognition, semantical analysis, logo recognition, graphic recognition or others, in order to compare the document in question with existing documents associated with a similar label and tag. If the comparison results in a strong match the system automatically applies the similar label and associated tag to the document in question. In another option, a user is prompted to apply a similar existing label associated with the tag. If the user decides to use this label, the label is applied.
  • document analysis methods such as text recognition, semantical analysis, logo recognition, graphic recognition or others
  • the colors code may reflect hierarchies specified by the user.
  • Such hierarchies determine the importance of the specific text and are created by marking more then one text in different colors in a single document. For instance, the customer name is associated with the color red, date is associated with the color yellow, and case number is associated with the color blue. Each color is given a predetermined location within the hierarchy, and the document can be filed as follows: customer/case number/date. Alternatively, the document can be filed as: date/customer/case number, etc. The filing of a document in accordance with more then one criterion may be executed simultaneously according to different filing colors. [00020] While certain features and embodiments of the present application have been described in detail herein, it is to be understood that the application encompasses all modifications and enhancements.

Abstract

A color-based computerized method for automatic electronic document indexing is disclosed. The method includes associating a color with at least one indexing criterion to create a tag, applying the color to text on a document, identifying the colored portions of the document, detecting text on the colored portions, comparing the detected text to a label associated with the tag, if the detected text matches a label associated with the tag, applying the tag and the label to the document, and filing the tagged and labeled document in a database or a specified folder. If the detected text does not match a label associated with the tag, optional procedures are applied.

Description

A COLOR-BASED COMPUTERIZED METHOD FOR AUTOMATIC
DOCUMENT INDEXING
Related applications
[0001] This application claims the benefit of priority to Provisional Application
Serial No. 60/996,452, which was filed on November 16, 2007, which is incorporated by reference in its entirety.
Field
[0002] The application relates to a color-based method for automatic electronic document indexing.
Background
[0003] Data is transmitted via two main routes: hard copy documents (books, mail, fax, news papers, etc.) and electronic documents (internet, cd, email). Data indexing, categorizing, tagging, and filing are burdensome and time consuming tasks. [0004] Nowadays, it is more common to scan documents and file them on a computer. Although scanning of hard copy documents allows for a later stage of manual electronic indexing and tagging, there is a need for a quick, simple and versatile automatic electronic method for data indexing and tagging for both electronic and scanned documents. Summary
[0005] The present application provides a method to categorize and tag documents and data using a combination of color and text. The method includes associating one or more colors with one or more indexing criteria to create a tag (e.g. 'Customer name' or 'date of document' etc), applying the one or more colors to text on a document, identifying the colored portions of the document, detecting text on the colored portions, comparing the detected text to a label (e.g. a specific name of customer) associated with the tag, if the detected text matches a label associated with the tag, applying the tag and the label to the document, and filing the tagged and labeled document in a database or a specified folder.
[0006] A computer readable medium is also described. The medium has instructions stored thereon executable by a process for carrying out the method including associating one or more colors with one or more indexing criteria to create a tag, applying the one or more colors to text on a document, identifying the colored portions of the document, detecting text on the colored portions, comparing the detected text to a label associated with the tag, if the detected text matches a label associated with the tag, applying the tag and the label to the document, and filing the tagged and labeled document in a database or a specified folder.
Brief Description of the Drawings
[0007] Exemplary embodiments of the invention are described herein with reference to the drawings, in which:
Fig. 1 shows a flow chart of the method of the present application; and Fig. 2 shows another flow chart of the method of the present application.
Detailed Description
[0008] A software-based method for computerized indexing is provided for the automatic indexing of electronic documents. The indexing method includes detecting colors and text in a document, tagging the document based on the colors, and filing the document into databases based on the detected text associated with the tags applied. [0009] Text/color combinations may be obtained in different ways, including, but not limited to: the text's background being color marked/highlighted manually, the text letters being specific colors or combination of colors, the text being defined by colored circles, lines or patterns, and possibly including a combination of colors (e.g. a blue line between green lines), or any combination of the above.
[00010] The color may be applied by marking text on a document, either by using a colored marker or other dedicated tool on the document, or on an electronic document by using the highlighting feature or other dedicated marking features. [00011] In one embodiment, only the color may be used to define specific texts for tagging. In such a case, the color doesn't have a meaning other than for creating general tags.
[00012] In another embodiment, the colors may have a meaning according to a predetermined code. For example, the color red may represent the 'customer name' title in a database. Any text having red background or other red marking will be categorized under 'customer name' in the tag database. The color yellow in the text background or other yellow marking may represent the document 'date' etc. [00013] Referring to Figure 1, a user creates a color/indexing criterion category or relationship. For example, when the criterion is a customer name, the user can mark the customer name in the text of the document with the associated color, such as red. This color/indexing criterion pair is then created as a tag and saved in either the document under the file properties, or in a separate database. Additional color/indexing criterion tags can also be created and saved, such as yellow being associated with the date of the document or blue being associated with a case number. Tags may also include labels associated with the tags. For example, labels associated with the customer name/red tag may include names of individual customers, such as Proctor &Gamble, Johnson & Johnson, and Kimberly Clark. Labels associated with the date of the document may be specific dates, such as 11/11/09.
[00014] For example, one tag may represent customer names plus the color red.
The labels associated with this tag may include the specific names of customers (such as Proctor &Gamble, Johnson & Johnson, and Kimberly Clark). Another tag may represent the date plus the color yellow. Labels associated with this tag may include the specific dates (such as 11/11/09).
[00015] Referring to Figure 2, when a document is received, either by email or in paper form and scanned, it is marked according to the color/indexing criterion tag. For example, each mention of the customer name on the document may be highlighted in red. The color marking may be performed with a marker or highlighter or other dedicated tool, on a paper document, or with the highlighting tool or other dedicated feature on an electronic document. [00016] The colored portions of the document are then identified, and a method is used to detect the text in the colored portions. For example, Optical Character Recognition (OCR) may be performed on the colored portions to detect text. Alternatively, any known method may be performed to detect text. When text is detected, that text is compared to a list of labels associated with the tag. If the detected text matches a label associated with that tag, then the tag and label are applied to the document, and the document is filed in a database or a specified folder. If the detected text is similar to a label associated with the tag but does not match a label exactly, such as in the case of a typo, two options may apply. In one option, the system applies document analysis methods such as text recognition, semantical analysis, logo recognition, graphic recognition or others, in order to compare the document in question with existing documents associated with a similar label and tag. If the comparison results in a strong match the system automatically applies the similar label and associated tag to the document in question. In another option, a user is prompted to apply a similar existing label associated with the tag. If the user decides to use this label, the label is applied. [00017] If the user does not want to apply the similar label, a new label is created and associated with the tag, and the new label is applied to the document. The document is then filed in the database or specified folder. In this example, the document would be filed within the customer name folder, under the specific customer name. [00018] If the detected text does not match a label associated with the tag and is not similar, such as in the case where the name does not exist in the customers list, a new label is created and associated with the tag, and the new label is applied to the document. The document is then filed in the database or specified folder. [00019] In yet another embodiment, the colors code may reflect hierarchies specified by the user. Such hierarchies determine the importance of the specific text and are created by marking more then one text in different colors in a single document. For instance, the customer name is associated with the color red, date is associated with the color yellow, and case number is associated with the color blue. Each color is given a predetermined location within the hierarchy, and the document can be filed as follows: customer/case number/date. Alternatively, the document can be filed as: date/customer/case number, etc. The filing of a document in accordance with more then one criterion may be executed simultaneously according to different filing colors. [00020] While certain features and embodiments of the present application have been described in detail herein, it is to be understood that the application encompasses all modifications and enhancements.

Claims

CLAIMS What is claimed is:
1. A method for color-based electronic indexing comprising: associating one or more colors with one or more indexing criteria to create tags; applying the one or more colors to text on a document; identifying the colored portions of the document; detecting text on the colored portions; comparing the detected text to a label associated with the tag; if the detected text matches a label associated with the tag, applying the tag and the label to the document; and filing the tagged and labeled document in a database or a specified folder.
2. The method of claim 1 further comprising: if the detected text is similar to a label associated with the tag but does not match the label exactly, applying document analysis methods to compare the document with existing documents associated with a similar label and tag; if the comparison results in a match:
(i) automatically applying the similar label and associated tag to the document; or
(ii) prompting a user to use the similar label or to create a new label associated with the tag.
3. The method of claim 2 further comprising: if the user chooses to apply the similar label, applying the tag and the similar label to the document; and if the user chooses not to apply the similar label, creating a new label associated with the tag and applying the tag and new label to the document.
4. The method of any one of claims 1 to 3 further comprising: if the detected text does not match a label associated with the tag and is not similar, creating a new label associated with the tag and applying the new label to the document.
5. The method of claims 2, 3 or 4 further comprising saving the new label in a separate database.
6. The method of any one of claims 1 to 5 wherein the document is scanned into a computer.
7. The method of any one of claims 1 to 6 wherein the colored portions are created by at least one of the following: using a highlighting or marking tool; using text letters of specific colors or combination of colors; highlighting text by colored shapes, lines or patterns; or a combination of one or more of the above.
8. The method of any one of claims 1 to 7 further comprising: associating additional separate colors with two or more indexing criteria to create additional tags; assigning a priority to each color to create a hierarchy for indexing; filing the document according to the hierarchy in a database or a specified folder.
9. The method of any one of claims 1 to 8 wherein detecting text on the colored portions includes performing optical character recognition (OCR) on the colored portions.
10. A computer readable medium having instructions stored thereon executable by a process for carrying out the method comprising: associating one or more colors with one or more indexing criterion to create tags; applying the one or more colors to text on a document; identifying the colored portions of the document; detecting text on the colored portions; comparing the detected text to a label associated with the tag; if the detected text matches a label associated with the tag, applying the tag and the label to the document; and filing the tagged and labeled document in a database or a specified folder.
11. The computer readable medium of claim 10 further comprising: if the detected text is similar to a label associated with the tag but does not match the label exactly, applying document analysis methods to compare the document with existing documents associated with a similar label and tag; if the comparison results in a match:
(i) automatically applying the similar label and associated tag to the document; or
(ii) prompting a user to use the similar label or to create a new label associated with the tag.
12. The computer readable medium of claim 11 further comprising: if the user chooses to apply the similar label, applying the tag and the similar label to the document; and if the user chooses not to apply the similar label, creating a new label associated with the tag and applying the tag and new label to the document.
13. The computer readable medium of any one of claims 10 to 12 further comprising: if the detected text does not match a label associated with the tag and is not similar, creating a new label associated with the tag and applying the new label to the document.
14. The computer readable medium of claims 11, 12, or 13 further comprising saving the new label in a separate database.
15. The computer readable medium of any one of claims 10 to 14 wherein the document is scanned into a computer.
16. The computer readable medium of any one of claims 10 to 15 wherein the colored portions are created by at least one of the following: using a highlighting or marking tool; using text letters of specific colors or combination of colors; highlighting text by colored shapes, lines or patterns; or a combination of one or more of the above.
17. The computer readable medium of any one of claims 10 to 16 further comprising: associating additional separate colors with two or more indexing criteria to create additional tags; assigning a priority to each color to create a hierarchy for indexing; filing the document according to the hierarchy in a database or a specified folder.
18. The computer readable medium of any one of claims 10 to 17 wherein detecting text on the colored portions includes performing optical character recognition (OCR) on the colored portions.
17. A method for color-based electronic indexing comprising: utilizing color marking to define text on a document; reading the defined text; and indexing the document based on the defined text.
PCT/IB2008/003857 2007-11-16 2008-11-17 A color-based computerized method for automatic document indexing WO2009063329A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US99645207P 2007-11-16 2007-11-16
US60/996,452 2007-11-16

Publications (1)

Publication Number Publication Date
WO2009063329A1 true WO2009063329A1 (en) 2009-05-22

Family

ID=40551357

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2008/003857 WO2009063329A1 (en) 2007-11-16 2008-11-17 A color-based computerized method for automatic document indexing

Country Status (1)

Country Link
WO (1) WO2009063329A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3716147A1 (en) * 2019-03-25 2020-09-30 Toshiba TEC Kabushiki Kaisha Image processing method and image processing apparatus
WO2022189899A1 (en) * 2021-03-12 2022-09-15 Ricoh Company, Ltd. Information processing system, processing method, and recording medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0251237A2 (en) * 1986-06-30 1988-01-07 Wang Laboratories Inc. Digital imaging file processing system
US5579407A (en) * 1992-04-21 1996-11-26 Murez; James D. Optical character classification

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0251237A2 (en) * 1986-06-30 1988-01-07 Wang Laboratories Inc. Digital imaging file processing system
US5579407A (en) * 1992-04-21 1996-11-26 Murez; James D. Optical character classification

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HANDSCHUH S ET AL: "CREAM: CREAting Metadata for the Semantic Web", COMPUTER NETWORKS, ELSEVIER SCIENCE PUBLISHERS B.V., AMSTERDAM, NL, vol. 42, no. 5, 5 August 2003 (2003-08-05), pages 579 - 598, XP004433788, ISSN: 1389-1286 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3716147A1 (en) * 2019-03-25 2020-09-30 Toshiba TEC Kabushiki Kaisha Image processing method and image processing apparatus
CN111738901A (en) * 2019-03-25 2020-10-02 东芝泰格有限公司 Storage medium and image processing apparatus
US11328448B2 (en) 2019-03-25 2022-05-10 Toshiba Tec Kabushiki Kaisha Image processing method and image processing apparatus
WO2022189899A1 (en) * 2021-03-12 2022-09-15 Ricoh Company, Ltd. Information processing system, processing method, and recording medium

Similar Documents

Publication Publication Date Title
US9141691B2 (en) Method for automatically indexing documents
US5159180A (en) Litigation support system and method
US5926565A (en) Computer method for processing records with images and multiple fonts
US9736331B2 (en) Device, system and method for identifying sections of documents
EP2354966A2 (en) System and method for visual document comparison using localized two-dimensional visual fingerprints
US8139870B2 (en) Image processing apparatus, recording medium, computer data signal, and image processing method
Papadopoulos et al. The IMPACT dataset of historical document images
EP1109125A2 (en) System for heuristically organizing scanned information
US20040044958A1 (en) Systems and methods for inserting a metadata tag in a document
US20030042319A1 (en) Automatic and semi-automatic index generation for raster documents
WO2006002009A2 (en) Document management system with enhanced intelligent document recognition capabilities
JP6504514B1 (en) Document classification system and method and accounting system and method.
CN104346415B (en) Method for naming image document
US20070212507A1 (en) Document Flagging And Indexing System
CN114117171A (en) Intelligent project file collecting method and system based on energized thinking
US7716639B2 (en) Specification wizard
WO2009063329A1 (en) A color-based computerized method for automatic document indexing
CN112445911A (en) Workflow assistance apparatus, system, method, and storage medium
JP4807618B2 (en) Image processing apparatus and image processing program
Dejean Extracting structured data from unstructured document with incomplete resources
AU2018100324B4 (en) Image Analysis
JP2018190064A (en) Accounting processing system
McCarthy et al. Early modern Oxford bindings in twenty‐first century markup
CN1187684C (en) Method for auto-extracting marked data content in electronic file
US20090123076A1 (en) Method for comparing computer-generated drawings

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08850331

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08850331

Country of ref document: EP

Kind code of ref document: A1