US10817656B2 - Methods and devices for enabling computers to automatically enter information into a unified database from heterogeneous documents - Google Patents

Methods and devices for enabling computers to automatically enter information into a unified database from heterogeneous documents Download PDF

Info

Publication number
US10817656B2
US10817656B2 US15/821,682 US201715821682A US10817656B2 US 10817656 B2 US10817656 B2 US 10817656B2 US 201715821682 A US201715821682 A US 201715821682A US 10817656 B2 US10817656 B2 US 10817656B2
Authority
US
United States
Prior art keywords
program code
image file
computer
text
fields
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
US15/821,682
Other versions
US20190155887A1 (en
Inventor
Sanjay Kutty
An Hongguo
Subhash C. Vinnakota
Keith Burke
Robert Seres
Danniel Condez
Erik Hanson
Anuradha Verma
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ADP Inc
Original Assignee
ADP Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ADP Inc filed Critical ADP Inc
Priority to US15/821,682 priority Critical patent/US10817656B2/en
Assigned to ADP, LLC reassignment ADP, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BURKE, KEITH, CONDEZ, DANNIEL, HANSON, ERIK, HONGGUO, AN, KUTTY, SANJAY, SERES, ROBERT, VERMA, ANURADHA, VINNAKOTA, SUBHASH C.
Publication of US20190155887A1 publication Critical patent/US20190155887A1/en
Application granted granted Critical
Publication of US10817656B2 publication Critical patent/US10817656B2/en
Assigned to ADP, INC. reassignment ADP, INC. CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: ADP, LLC
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/93Document management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/174Form filling; Merging
    • G06K9/00449
    • G06K9/00456
    • G06K9/344
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • G06K2209/01
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]

Definitions

  • the present disclosure relates to methods and devices for enabling computers to automatically enter information into a unified database from heterogenous documents.
  • OCR optical character recognition
  • PDF portable document format
  • the illustrative embodiments provide for a computer-implemented method of enabling a computer to automatically enter information into a unified database from heterogenous documents.
  • the computer-implemented method includes receiving, at a processor, an image file.
  • the computer-implemented method also includes displaying, by the processor, the image file in a first area of a window rendered on a tangible display device.
  • the computer-implemented method also includes displaying, by the processor, fields for data entry in a second area of the window.
  • the computer-implemented method also includes performing, by the processor, optical character recognition on the image file.
  • the computer-implemented method also includes identifying, by the processor, at least one parameter of text in the image file.
  • the computer-implemented method also includes comparing, by the processor, the at least one parameter of the text to at least one of a plurality of stored parameters.
  • the computer-implemented method also includes sorting, by the processor, the text according to the at least one of the plurality of stored parameters into a plurality of categories, wherein sorted text is formed.
  • the computer-implemented method also includes auto-populating and displaying, by the processor, the fields in the second area of the window based on the sorted text.
  • the illustrative embodiments also contemplate a non-transitory computer-recordable storage medium storing program code, which when executed by a processor, performs the above method.
  • the illustrative embodiments also contemplate a computer including a processor and a non-transitory computer-recordable storage medium storing program code, which when executed by the processor, performs the above method.
  • FIG. 1 illustrates a sample screenshot of a user interface for software configured to receive image files and auto populate specific fields in a unified database, in accordance with an illustrative embodiment
  • FIG. 2 illustrates another sample screenshot of a user interface for software configured to receive image files and auto populate specific fields in a unified database, in accordance with an illustrative embodiment
  • FIG. 3 illustrates a flowchart of a method for receiving image files and auto populating specific fields in a unified database, in accordance with an illustrative embodiment
  • FIG. 4 illustrates a data processing system, in accordance with an illustrative embodiment.
  • OCR optical character recognition
  • unified database is defined as one or more databases, whether relational databases, content addressable databases, or other types of databases, which together are directed towards a common enterprise and use a common set of identifiers.
  • databases when taken together, could contain information regarding employee records, tax information, and other information, that use a common system of identifiers.
  • employee name would be the name of a field throughout all databases so that confusion is avoided when working with the databases in the context of a single enterprise.
  • This wage garnishment example is just one example.
  • the human resource department or the third party vendor also must process tax information such as data entered into W-2s, taxes paid to multiple government agencies, and many others.
  • the illustrative embodiments recognize and take into account that even when this data comes in the form of electronic files displayable on a computer, a human user must take an inordinate and undesirable amount of time to enter the correct information into the unified database of the human resources department or the third-party vendor.
  • the illustrative embodiments provide for methods and devices that address these issues and provide a means for enabling computers to automatically enter information into a unified database from heterogeneous documents.
  • the illustrative embodiments take advantage of OCR technology, but also utilize a database of common terms to identify candidates for entries into a field of a unified database.
  • the illustrative embodiments automatically populate fields of interest, and then display the populated fields so that a user can verify the entries.
  • the computer can automatically verify the entries into the fields to confirm that they relate to an employee.
  • the computer can verify that “John Doe” is a valid entry by confirming that “John Doe” actually is an employee recorded in the unified database.
  • the illustrative embodiments further recognize and take into account the user interface problem of operating multiple windows of different software products; one to view the documents, and another to perform data entry. Switching between windows is inconvenient and wastes time during data entry.
  • the illustrative embodiments also provide a means for displaying a single window which allows for selection of an image file for processing, displays the image file, and presents fields for entering data into the unified database.
  • the illustrative embodiments address these and other issues by providing for methods and devices for enabling computers to automatically enter information into a unified database from heterogeneous documents.
  • attention is now turned to the figures.
  • FIG. 1 illustrates a sample screenshot of a user interface for software configured to receive image files and auto populate specific fields in a unified database, in accordance with an illustrative embodiment.
  • Screenshot 100 is displayed on a tangible display device, such as display 400 of FIG. 4 and is generated by a processor, such processor unit 404 of FIG. 4 , executing program code designed to render screenshot 100 and provide functionality for at least some of the images shown.
  • This program code may be implemented as software, firmware, or both.
  • Screenshot 100 shows two primary areas, area 102 and area 104 .
  • An “area”, as used herein, is a portion of a display on a device that shows part of the screenshot.
  • Area 102 is used to display information related to the document or documents to be processed.
  • Area 104 is used to display information useful for entering information into the unified database.
  • area 102 Attention is first turned to area 102 .
  • instructions 106 , instructions 108 , and/or select files 110 are provided to prompt a user to access the files from which data is to be processed.
  • Title 112 may be provided to remind the user as to which types of files are to be processed.
  • this illustrative embodiment described a method for presenting a display for a user to retrieve desired image files
  • the illustrative embodiments also contemplate automatically presenting a user with image files for processing.
  • the illustrative embodiments further contemplate automatically selecting and processing image files such that a user is not involved in the process of converting heterogeneous image files into entries into a unified database.
  • area 104 is also displayed on screenshot 100 .
  • Title 114 indicates to a user the nature of what is displayed in area 104 , which in this case is details of the agency notice displayed in area 102 that are to be entered in fields in area 104 for subsequent entry into the unified database. Ultimately the purpose of this data entry is to assist the enterprise in properly complying with the requirements of a specifically received agency notice.
  • the use of area 104 is described with respect to FIG. 2 .
  • FIG. 2 illustrates another sample screenshot of a user interface for software configured to receive image files and auto populate specific fields in a unified database, in accordance with an illustrative embodiment.
  • Screenshot 200 is related to screenshot 100 in that screenshot 200 is taken after an image file has been uploaded and is being displayed in area 102 .
  • Screenshot 200 shows an example of how a heterogeneous image file can be processed and its relevant information transmitted to fields in area 104 for entry into a unified database.
  • screenshot 200 is displayed on a tangible display device, such as display 414 of FIG. 4 , as a result of a processor, such as processor unit 404 of FIG.
  • program 4 executing a program embodied either as program code on a non-transitory computer-recordable storage medium, or as firmware.
  • program Whether implemented as program code on a non-transitory computer-recordable storage medium or as firmware, the term “program” shall be used, though the term “program” excludes purely signal based media.
  • area 102 shows image 202 of agency notice 204 .
  • Agency notice 204 in this illustrative embodiment, relates tax information that by law must be processed by the enterprise.
  • the program loads the image file of agency notice 204 and performs optical character recognition (OCR) on the file.
  • OCR optical character recognition
  • the program compares text extracted from the file based on the OCR to a plurality of terms stored in a database in order to characterize the text.
  • the extracted text can be compared not only by text matching, but also by analyzing a location from where text was lifted, and according to patterns of text.
  • the program can determine that the name of the “company” in this particular agency notice is “Automatic Data Processing” based on the location of this term in agency notice 204 as well as the recognizable pattern of a sender's address bar near the top of the page. Additionally, the term “ADP” is associated with the company.
  • such a comparison is not necessary.
  • the user can simply read the page and enter the term “Automatic Data Processing” or possibly “ADP” in field 116 , which is the “company name” to be entered into the unified database.
  • the user can likewise fill out other fields in area 104 .
  • sample answers are automatically generated and automatically copied into the relevant fields in area 104 .
  • field 116 will be auto-populated with the term “Automatic Data Processing” or perhaps “ADP”.
  • the remaining fields and button selections will likewise be auto-populated and auto-selected.
  • a user will review the automatically supplied entries into the fields shown in area 104 .
  • the user can then submit the entries, which are then transferred to the unified database for further processing an appropriate action.
  • the user can make adjustments to the field entries and button selections prior to submission of the data.
  • submission is automatic, and user is not required at all. In this case, all processing takes place out of sight of a user, with data automatically being input into the unified database.
  • this particular illustrative embodiment is less useful the more heterogeneous the documents being processed. For example, when tax documents are received from a wide variety of companies in a wide variety of different formats, then the likelihood of errors in automatic population of the fields of interest increases. When the probability of such errors increases, adding a human reviewer to the process can increase the accuracy of the data transfer process.
  • the illustrative embodiments provide an integrated technology for reviewing heterogeneous image documents for text and entering this text data appropriately into a unified database.
  • the illustrative embodiments may auto populate fields in one illustrative embodiment, thereby substantially increasing the speed of such data processing.
  • the illustrative embodiments enable computers to automatically enter information into a unified database from heterogenous documents, thereby accomplishing a technical effect.
  • Another technical effect of the illustrative embodiments is enabling an improved user interface for human users so that human users may more efficiently use a computer to accomplish desired data entry tasks.
  • the illustrative embodiments are implemented solely in a computer, intrinsically a part of the operation of computer, and relate only to improving computer functionality and presentation. Thus, the illustrative embodiments cannot be accomplished by a human being, but rather only by a computer improved using the techniques described herein.
  • FIG. 1 and FIG. 2 do not necessarily limit other illustrative embodiments or the claims. Many variations are possible, based on many different types of documents, fields of interest for data entry, or other enterprise goals.
  • the illustrative embodiments may also be extended.
  • the illustrative embodiments contemplate automatically processing multiple image documents simultaneously.
  • the illustrative embodiments contemplate collating information for entry into fields which request information regarding, for example, now many times a given item is referenced across multiple documents.
  • the illustrative embodiments also contemplate processing either homogenous or heterogeneous file types and formats, such as but not limited to .png, .pdf, .jpg, .jif, and many other file types.
  • the illustrative embodiments are not necessarily limited to the examples given above.
  • FIG. 3 illustrates a flowchart of a method for receiving image files and auto populating specific fields in a unified database, in accordance with an illustrative embodiment.
  • Method 300 may be implemented using a data processing system, such as data processing system 400 of FIG. 4 .
  • Method 300 is a variation of the methods described above with respect to FIG. 1 and FIG. 2 .
  • Method 300 is only performable by a computer and accomplishes the technical effects described above with respect to FIG. 2 .
  • Method 300 may be characterized as a method of enabling a computer to automatically enter information into a unified database from heterogenous documents.
  • Method 300 includes receiving, at a processor, an image file (operation 302 ). Method 300 also includes displaying, by the processor, the image file in a first area of a window rendered on a tangible display device (operation 304 ). Method 300 also includes displaying, by the processor, fields for data entry in a second area of the window (operation 306 ).
  • Method 300 also includes performing, by the processor, optical character recognition on the image file (operation 308 ).
  • Method 300 also includes identifying, by the processor, at least one parameter of text in the image file (operation 310 ).
  • This parameter or parameters may take many different forms, as described above.
  • a parameter may be the text itself for text matching, a location of the text in the image file, surrounding text for pattern recognition matching, pre-stored codes, words, or phrases, color used in the image file, image file type, and potentially many others.
  • the purpose of the parameter or parameters is to enable the computer to recognize appropriate text from potentially many different heterogeneous image files for entry into one or more specific fields for ultimate entry into a unified database.
  • Method 300 also includes comparing, by the processor, the at least one parameter of the text to at least one of a plurality of stored parameters (operation 312 ).
  • Method 300 also includes sorting, by the processor, the text according to the at least one of the plurality of stored parameters into a plurality of categories, wherein sorted text is formed (operation 314 ). Sorting into a plurality of categories is specifically related to determining which alphanumeric text sequences should be applied to which fields in the second area of the display window. For example, the phrase “Automatic Data Processing” can be recognized as belonging to the category of “company name” and thus assigned to a field accordingly.
  • method 300 also includes auto-populating and displaying, by the processor, the fields in the second area of the window based on the sorted text (operation 316 ).
  • Method 300 may be varied by including different operations, or by including additional operations, or by potentially using fewer operations. Some of these additional illustrative embodiments follow, and are shown in FIG. 3 as boxes surrounded dotted lines to indicate that they are optional additional steps taken with respect to method 300 .
  • method 300 may also include submitting the fields as entries into a unified database (operation 318 ).
  • method 300 may also include receiving, prior to submitting, user input from a user input device indicating that the fields are correct (operation 320 ).
  • a user could possibly edit the entries in the fields prior to submission.
  • method 300 may also include automatically taking an action, based on the entries in the unified database, required by an order stated in a document from which the image file was made (operation 322 ).
  • An example of such an action would be, responsive to receiving a court order, withholding wages from an employee's paycheck and paying the withheld wages to a designated payee.
  • Another example would be to populate a paystub and transmit the paystub to an employee or others authorized to receive the paystub.
  • Many different actions are possible, and such actions are not necessarily limited to a human resources context.
  • the first area and the second area are displayed side by side in the window, whereby use of multiple display windows is avoided.
  • the image comprises a plurality of images taken from a plurality of heterogeneous image files, and wherein auto-populating is performed for different sets of fields for each one of the plurality of heterogeneous image files.
  • displaying the image file and displaying the fields is performed on a web browser of a local computer, and wherein receiving, performing, identifying, comparing, sorting, and auto-populating are performed by a remote server as software as a service.
  • the computer-implemented method is performed on a single local computer.
  • Data processing system 400 in FIG. 4 is an example of a data processing system that may be used to implement the illustrative embodiments, such screenshot 100 of FIG. 1 , screenshot 200 of FIG. 2 , method 300 of FIG. 3 , or any other module or system or process disclosed herein.
  • data processing system 400 includes communications fabric 402 , which provides communications between processor unit 404 , memory 406 , persistent storage 408 , communications unit 410 , input/output (I/O) unit 412 , and display 414 .
  • communications fabric 402 which provides communications between processor unit 404 , memory 406 , persistent storage 408 , communications unit 410 , input/output (I/O) unit 412 , and display 414 .
  • Processor unit 404 serves to execute instructions for software that may be loaded into memory 406 .
  • This software may be an associative memory, content addressable memory, or software for implementing the processes described elsewhere herein.
  • software loaded into memory 406 may be software for executing method 300 of FIG. 3 .
  • Processor unit 404 may be a number of processors, a multi-processor core, or some other type of processor, depending on the particular implementation.
  • a number, as used herein with reference to an item, means one or more items.
  • processor unit 404 may be implemented using a number of heterogeneous processor systems in which a main processor is present with secondary processors on a single chip.
  • processor unit 404 may be a symmetric multi-processor system containing multiple processors of the same type.
  • Memory 406 and persistent storage 408 are examples of storage devices 416 .
  • a storage device is any piece of hardware that is capable of storing information, such as, for example, without limitation, data, program code in functional form, and/or other suitable information either on a temporary basis and/or a permanent basis.
  • Storage devices 416 may also be referred to as computer-readable storage devices in these examples.
  • Memory 406 in these examples, may be, for example, a random-access memory or any other suitable volatile or non-volatile storage device.
  • Persistent storage 408 may take various forms, depending on the particular implementation.
  • persistent storage 408 may contain one or more components or devices.
  • persistent storage 408 may be a hard drive, a flash memory, a rewritable optical disk, a rewritable magnetic tape, or some combination of the above.
  • the media used by persistent storage 408 also may be removable.
  • a removable hard drive may be used for persistent storage 408 .
  • Communications unit 410 in these examples, provides for communications with other data processing systems or devices.
  • communications unit 410 is a network interface card.
  • Communications unit 410 may provide communications through the use of either or both physical and wireless communications links.
  • Input/output (I/O) unit 412 allows for input and output of data with other devices that may be connected to data processing system 400 .
  • input/output (I/O) unit 412 may provide a connection for user input through a keyboard, a mouse, and/or some other suitable input device. Further, input/output (I/O) unit 412 may send output to a printer.
  • Display 414 provides a mechanism to display information to a user.
  • Instructions for the operating system, applications, and/or programs may be located in storage devices 416 , which are in communication with processor unit 404 through communications fabric 402 .
  • the instructions are in a functional form on persistent storage 408 . These instructions may be loaded into memory 406 for execution by processor unit 404 .
  • the processes of the different embodiments may be performed by processor unit 404 using computer implemented instructions, which may be located in a memory, such as memory 406 .
  • program code computer-usable program code
  • computer-readable program code that may be read and executed by a processor in processor unit 404 .
  • the program code in the different embodiments may be embodied on different physical or computer-readable storage media, such as memory 406 or persistent storage 408 .
  • Program code 418 is located in a functional form on computer-readable media 420 that is selectively removable and may be loaded onto or transferred to data processing system 400 for execution by processor unit 404 .
  • Program code 418 and computer-readable media 420 form computer program product 422 in these examples.
  • computer-readable media 420 may be computer-readable storage media 424 or computer-readable signal media 426 .
  • Computer-readable storage media 424 may include, for example, an optical or magnetic disk that is inserted or placed into a drive or other device that is part of persistent storage 408 for transfer onto a storage device, such as a hard drive, that is part of persistent storage 408 .
  • Computer-readable storage media 424 also may take the form of a persistent storage, such as a hard drive, a thumb drive, or a flash memory, that is connected to data processing system 400 . In some instances, computer-readable storage media 424 may not be removable from data processing system 400 .
  • program code 418 may be transferred to data processing system 400 using computer-readable signal media 426 .
  • Computer-readable signal media 426 may be, for example, a propagated data signal containing program code 418 .
  • Computer-readable signal media 426 may be an electromagnetic signal, an optical signal, and/or any other suitable type of signal. These signals may be transmitted over communications links, such as wireless communications links, optical fiber cable, coaxial cable, a wire, and/or any other suitable type of communications link.
  • the communications link and/or the connection may be physical or wireless in the illustrative examples.
  • program code 418 may be downloaded over a network to persistent storage 408 from another device or data processing system through computer-readable signal media 426 for use within data processing system 400 .
  • program code stored in a computer-readable storage medium in a server data processing system may be downloaded over a network from the server to data processing system 400 .
  • the data processing system providing program code 418 may be a server computer, a client computer, or some other device capable of storing and transmitting program code 418 .
  • the different components illustrated for data processing system 400 are not meant to provide architectural limitations to the manner in which different embodiments may be implemented.
  • the different illustrative embodiments may be implemented in a data processing system including components in addition to or in place of those illustrated for data processing system 400 .
  • Other components shown in FIG. 4 can be varied from the illustrative examples shown.
  • the different embodiments may be implemented using any hardware device or system capable of running program code.
  • the data processing system may include organic components integrated with inorganic components and/or may be comprised entirely of organic components excluding a human being.
  • a storage device may be comprised of an organic semiconductor.
  • processor unit 404 may take the form of a hardware unit that has circuits that are manufactured or configured for a particular use. This type of hardware may perform operations without needing program code to be loaded into a memory from a storage device to be configured to perform the operations.
  • processor unit 404 when processor unit 404 takes the form of a hardware unit, processor unit 404 may be a circuit system, an application specific integrated circuit (ASIC), a programmable logic device, or some other suitable type of hardware configured to perform a number of operations.
  • ASIC application specific integrated circuit
  • a programmable logic device the device is configured to perform the number of operations. The device may be reconfigured at a later time or may be permanently configured to perform the number of operations.
  • Examples of programmable logic devices include, for example, a programmable logic array, programmable array logic, a field programmable logic array, a field programmable gate array, and other suitable hardware devices.
  • program code 418 may be omitted because the processes for the different embodiments are implemented in a hardware unit.
  • processor unit 404 may be implemented using a combination of processors found in computers and hardware units.
  • Processor unit 404 may have a number of hardware units and a number of processors that are configured to run program code 418 . With this depicted example, some of the processes may be implemented in the number of hardware units, while other processes may be implemented in the number of processors.
  • a storage device in data processing system 400 is any hardware apparatus that may store data.
  • Memory 406 , persistent storage 408 , and computer-readable media 420 are examples of storage devices in a tangible form.
  • a bus system may be used to implement communications fabric 402 and may be comprised of one or more buses, such as a system bus or an input/output bus.
  • the bus system may be implemented using any suitable type of architecture that provides for a transfer of data between different components or devices attached to the bus system.
  • a communications unit may include one or more devices used to transmit and receive data, such as a modem or a network adapter.
  • a memory may be, for example, memory 406 , or a cache, such as found in an interface and memory controller hub that may be present in communications fabric 402 .
  • the different illustrative embodiments can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment containing both hardware and software elements.
  • Some embodiments are implemented in software, which includes but is not limited to forms such as, for example, firmware, resident software, and microcode.
  • a computer-usable or computer-readable medium can generally be any tangible apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
  • the computer-usable or computer-readable medium can be, for example, without limitation an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, or a propagation medium.
  • a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk, and an optical disk.
  • Optical disks may include compact disk-read only memory (CD-ROM), compact disk-read/write (CD-R/W), and DVD.
  • a computer-usable or computer-readable medium may contain or store a computer-readable or computer-usable program code such that when the computer-readable or computer-usable program code is executed on a computer, the execution of this computer-readable or computer-usable program code causes the computer to transmit another computer-readable or computer-usable program code over a communications link.
  • This communications link may use a medium that is, for example without limitation, physical or wireless.
  • a data processing system suitable for storing and/or executing computer-readable or computer-usable program code will include one or more processors coupled directly or indirectly to memory elements through a communications fabric, such as a system bus.
  • the memory elements may include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some computer-readable or computer-usable program code to reduce the number of times code may be retrieved from bulk storage during execution of the code.
  • I/O devices can be coupled to the system either directly or through intervening I/O controllers. These devices may include, for example, without limitation, keyboards, touch screen displays, and pointing devices. Different communications adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Non-limiting examples of modems and network adapters are just a few of the currently available types of communications adapters.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Character Discrimination (AREA)

Abstract

Enabling a computer to automatically enter information into a unified database from heterogenous documents. An image file is received. The image file is displayed in a first area of a window rendered on a tangible display device. The fields for data entry are displayed in a second area of the window. Optical character recognition is performed on the image file. At least one parameter of text is identified in the image file. The at least one parameter of the text is compared to at least one of a plurality of stored parameters. The text is sorted according to the at least one of the plurality of stored parameters into a plurality of categories, wherein sorted text is formed. The fields are auto-populated and displayed in the second area of the window based on the sorted text.

Description

BACKGROUND INFORMATION 1. Field
The present disclosure relates to methods and devices for enabling computers to automatically enter information into a unified database from heterogenous documents.
2. Background
Optical character recognition (OCR) of scanned or other electronic documents has been used to aid users in extracting text data from electronic picture files, portable document format (PDF) files, or other types of files which can be used to display text information on a computer screen. In some cases, the resulting text information can be copied and pasted into other documents or manually transferred as input into other software programs. However, the inability to interpret such information has prevented computers from automatically performing optical character recognition and then automatically transferring such information into desired specific data fields for entry into a unified database.
SUMMARY
The illustrative embodiments provide for a computer-implemented method of enabling a computer to automatically enter information into a unified database from heterogenous documents. The computer-implemented method includes receiving, at a processor, an image file. The computer-implemented method also includes displaying, by the processor, the image file in a first area of a window rendered on a tangible display device. The computer-implemented method also includes displaying, by the processor, fields for data entry in a second area of the window. The computer-implemented method also includes performing, by the processor, optical character recognition on the image file. The computer-implemented method also includes identifying, by the processor, at least one parameter of text in the image file. The computer-implemented method also includes comparing, by the processor, the at least one parameter of the text to at least one of a plurality of stored parameters. The computer-implemented method also includes sorting, by the processor, the text according to the at least one of the plurality of stored parameters into a plurality of categories, wherein sorted text is formed. The computer-implemented method also includes auto-populating and displaying, by the processor, the fields in the second area of the window based on the sorted text.
The illustrative embodiments also contemplate a non-transitory computer-recordable storage medium storing program code, which when executed by a processor, performs the above method. The illustrative embodiments also contemplate a computer including a processor and a non-transitory computer-recordable storage medium storing program code, which when executed by the processor, performs the above method.
BRIEF DESCRIPTION OF THE DRAWINGS
The novel features believed characteristic of the illustrative embodiments are set forth in the appended claims. The illustrative embodiments, however, as well as a preferred mode of use, further objectives and features thereof, will best be understood by reference to the following detailed description of an illustrative embodiment of the present disclosure when read in conjunction with the accompanying drawings, wherein:
FIG. 1 illustrates a sample screenshot of a user interface for software configured to receive image files and auto populate specific fields in a unified database, in accordance with an illustrative embodiment;
FIG. 2 illustrates another sample screenshot of a user interface for software configured to receive image files and auto populate specific fields in a unified database, in accordance with an illustrative embodiment;
FIG. 3 illustrates a flowchart of a method for receiving image files and auto populating specific fields in a unified database, in accordance with an illustrative embodiment; and
FIG. 4 illustrates a data processing system, in accordance with an illustrative embodiment.
DETAILED DESCRIPTION
The illustrative embodiments recognize and take into account that the inability to interpret the meaning of text identified by optical character recognition (OCR) has prevented computers from automatically performing OCR and then automatically transferring such information into desired specific data fields for entry into a unified database. Thus, for example, when a business desires to enter information from heterogeneous sources into a unified database, traditionally a human user must read the document, possibly on a window on the computer screen, and then manually enter the relevant information into pre-determined fields for entry into the unified database.
As used herein, the term “unified database” is defined as one or more databases, whether relational databases, content addressable databases, or other types of databases, which together are directed towards a common enterprise and use a common set of identifiers. For example, several databases, when taken together, could contain information regarding employee records, tax information, and other information, that use a common system of identifiers. For example, in a unified database, the term “employee name” would be the name of a field throughout all databases so that confusion is avoided when working with the databases in the context of a single enterprise.
In a more specific example, consider a human resources department in a large business that employs tens of thousands of employees, or alternatively a third party vendor hired to process these types of human resources transactions. In the ordinary course of business, the human resource department will receive wage garnishment orders for some of its employees. However, these wage garnishment orders come from disparate courts, jurisdictions, lawyers, and are presented in many different formats. Nevertheless, all have key information which is to be entered into the company's unified database. For example, the human resources department will record the name of the employee, the amount of garnishment, tax identification information, the payee, and other information needed to withhold money from the employee's paycheck and transfer that money to the payee designated in the court order.
However, while such information may be universal to all such orders, the manner in which the information is presented is anything but universal. For example, take something as simple as the payor's name. The terms “name”, “identifier”, “ID”, “payor”, “defendant”, and potentially many other terms may be used as the key word that indicates that the text that follows is the name of the person subject to the garnishment. Thus, even though an electronic document has been processed by OCR, the computer cannot simply match terms and correctly enter the name “John Doe” into a field named “employee name” for the unified database with which the computer communicates.
This wage garnishment example is just one example. The human resource department or the third party vendor also must process tax information such as data entered into W-2s, taxes paid to multiple government agencies, and many others. The illustrative embodiments recognize and take into account that even when this data comes in the form of electronic files displayable on a computer, a human user must take an inordinate and undesirable amount of time to enter the correct information into the unified database of the human resources department or the third-party vendor.
Thus, the illustrative embodiments provide for methods and devices that address these issues and provide a means for enabling computers to automatically enter information into a unified database from heterogeneous documents. The illustrative embodiments take advantage of OCR technology, but also utilize a database of common terms to identify candidates for entries into a field of a unified database. The illustrative embodiments automatically populate fields of interest, and then display the populated fields so that a user can verify the entries. In other illustrative embodiments, the computer can automatically verify the entries into the fields to confirm that they relate to an employee. For example, if the employee name “John Doe” is automatically populated into the “employee name” field, then the computer can verify that “John Doe” is a valid entry by confirming that “John Doe” actually is an employee recorded in the unified database.
The illustrative embodiments further recognize and take into account the user interface problem of operating multiple windows of different software products; one to view the documents, and another to perform data entry. Switching between windows is inconvenient and wastes time during data entry. Thus, the illustrative embodiments also provide a means for displaying a single window which allows for selection of an image file for processing, displays the image file, and presents fields for entering data into the unified database.
The illustrative embodiments address these and other issues by providing for methods and devices for enabling computers to automatically enter information into a unified database from heterogeneous documents. In particular, attention is now turned to the figures.
FIG. 1 illustrates a sample screenshot of a user interface for software configured to receive image files and auto populate specific fields in a unified database, in accordance with an illustrative embodiment. Screenshot 100 is displayed on a tangible display device, such as display 400 of FIG. 4 and is generated by a processor, such processor unit 404 of FIG. 4, executing program code designed to render screenshot 100 and provide functionality for at least some of the images shown. This program code may be implemented as software, firmware, or both.
Screenshot 100 shows two primary areas, area 102 and area 104. An “area”, as used herein, is a portion of a display on a device that shows part of the screenshot. Area 102 is used to display information related to the document or documents to be processed. Area 104 is used to display information useful for entering information into the unified database.
Attention is first turned to area 102. In area 102, instructions 106, instructions 108, and/or select files 110 are provided to prompt a user to access the files from which data is to be processed. Title 112 may be provided to remind the user as to which types of files are to be processed. Note that while this illustrative embodiment described a method for presenting a display for a user to retrieve desired image files, the illustrative embodiments also contemplate automatically presenting a user with image files for processing. The illustrative embodiments further contemplate automatically selecting and processing image files such that a user is not involved in the process of converting heterogeneous image files into entries into a unified database.
Continuing with the example of FIG. 1, area 104 is also displayed on screenshot 100. Title 114 indicates to a user the nature of what is displayed in area 104, which in this case is details of the agency notice displayed in area 102 that are to be entered in fields in area 104 for subsequent entry into the unified database. Ultimately the purpose of this data entry is to assist the enterprise in properly complying with the requirements of a specifically received agency notice. The use of area 104 is described with respect to FIG. 2.
FIG. 2 illustrates another sample screenshot of a user interface for software configured to receive image files and auto populate specific fields in a unified database, in accordance with an illustrative embodiment. Screenshot 200 is related to screenshot 100 in that screenshot 200 is taken after an image file has been uploaded and is being displayed in area 102. Screenshot 200 shows an example of how a heterogeneous image file can be processed and its relevant information transmitted to fields in area 104 for entry into a unified database. Like screenshot 100, screenshot 200 is displayed on a tangible display device, such as display 414 of FIG. 4, as a result of a processor, such as processor unit 404 of FIG. 4, executing a program embodied either as program code on a non-transitory computer-recordable storage medium, or as firmware. Whether implemented as program code on a non-transitory computer-recordable storage medium or as firmware, the term “program” shall be used, though the term “program” excludes purely signal based media.
Again, area 102 shows image 202 of agency notice 204. Agency notice 204, in this illustrative embodiment, relates tax information that by law must be processed by the enterprise. In an illustrative embodiment, the program loads the image file of agency notice 204 and performs optical character recognition (OCR) on the file. The program then compares text extracted from the file based on the OCR to a plurality of terms stored in a database in order to characterize the text. The extracted text can be compared not only by text matching, but also by analyzing a location from where text was lifted, and according to patterns of text. Thus, for example, the program can determine that the name of the “company” in this particular agency notice is “Automatic Data Processing” based on the location of this term in agency notice 204 as well as the recognizable pattern of a sender's address bar near the top of the page. Additionally, the term “ADP” is associated with the company.
In one illustrative embodiment, such a comparison is not necessary. The user can simply read the page and enter the term “Automatic Data Processing” or possibly “ADP” in field 116, which is the “company name” to be entered into the unified database. The user can likewise fill out other fields in area 104.
However, preferably, sample answers are automatically generated and automatically copied into the relevant fields in area 104. Thus, for example, field 116 will be auto-populated with the term “Automatic Data Processing” or perhaps “ADP”. The remaining fields and button selections will likewise be auto-populated and auto-selected.
In one illustrative embodiment, a user will review the automatically supplied entries into the fields shown in area 104. The user can then submit the entries, which are then transferred to the unified database for further processing an appropriate action. Alternatively, the user can make adjustments to the field entries and button selections prior to submission of the data.
In another illustrative embodiment, submission is automatic, and user is not required at all. In this case, all processing takes place out of sight of a user, with data automatically being input into the unified database. However, this particular illustrative embodiment is less useful the more heterogeneous the documents being processed. For example, when tax documents are received from a wide variety of companies in a wide variety of different formats, then the likelihood of errors in automatic population of the fields of interest increases. When the probability of such errors increases, adding a human reviewer to the process can increase the accuracy of the data transfer process.
One advantage to the illustrative embodiments is that a user does not have to switch between different windows of different software programs while using the program. Thus, the illustrative embodiments provide an integrated technology for reviewing heterogeneous image documents for text and entering this text data appropriately into a unified database. The illustrative embodiments may auto populate fields in one illustrative embodiment, thereby substantially increasing the speed of such data processing.
In this manner, the illustrative embodiments enable computers to automatically enter information into a unified database from heterogenous documents, thereby accomplishing a technical effect. Another technical effect of the illustrative embodiments is enabling an improved user interface for human users so that human users may more efficiently use a computer to accomplish desired data entry tasks. The illustrative embodiments are implemented solely in a computer, intrinsically a part of the operation of computer, and relate only to improving computer functionality and presentation. Thus, the illustrative embodiments cannot be accomplished by a human being, but rather only by a computer improved using the techniques described herein.
The examples provided in FIG. 1 and FIG. 2 do not necessarily limit other illustrative embodiments or the claims. Many variations are possible, based on many different types of documents, fields of interest for data entry, or other enterprise goals. The illustrative embodiments may also be extended. For example, the illustrative embodiments contemplate automatically processing multiple image documents simultaneously. The illustrative embodiments contemplate collating information for entry into fields which request information regarding, for example, now many times a given item is referenced across multiple documents. The illustrative embodiments also contemplate processing either homogenous or heterogeneous file types and formats, such as but not limited to .png, .pdf, .jpg, .jif, and many other file types. Thus, again, the illustrative embodiments are not necessarily limited to the examples given above.
FIG. 3 illustrates a flowchart of a method for receiving image files and auto populating specific fields in a unified database, in accordance with an illustrative embodiment. Method 300 may be implemented using a data processing system, such as data processing system 400 of FIG. 4. Method 300 is a variation of the methods described above with respect to FIG. 1 and FIG. 2. Method 300 is only performable by a computer and accomplishes the technical effects described above with respect to FIG. 2. Method 300 may be characterized as a method of enabling a computer to automatically enter information into a unified database from heterogenous documents.
Method 300 includes receiving, at a processor, an image file (operation 302). Method 300 also includes displaying, by the processor, the image file in a first area of a window rendered on a tangible display device (operation 304). Method 300 also includes displaying, by the processor, fields for data entry in a second area of the window (operation 306).
Method 300 also includes performing, by the processor, optical character recognition on the image file (operation 308). Method 300 also includes identifying, by the processor, at least one parameter of text in the image file (operation 310). This parameter or parameters may take many different forms, as described above. For example, a parameter may be the text itself for text matching, a location of the text in the image file, surrounding text for pattern recognition matching, pre-stored codes, words, or phrases, color used in the image file, image file type, and potentially many others. The purpose of the parameter or parameters is to enable the computer to recognize appropriate text from potentially many different heterogeneous image files for entry into one or more specific fields for ultimate entry into a unified database.
Method 300 also includes comparing, by the processor, the at least one parameter of the text to at least one of a plurality of stored parameters (operation 312). Method 300 also includes sorting, by the processor, the text according to the at least one of the plurality of stored parameters into a plurality of categories, wherein sorted text is formed (operation 314). Sorting into a plurality of categories is specifically related to determining which alphanumeric text sequences should be applied to which fields in the second area of the display window. For example, the phrase “Automatic Data Processing” can be recognized as belonging to the category of “company name” and thus assigned to a field accordingly. Thus, method 300 also includes auto-populating and displaying, by the processor, the fields in the second area of the window based on the sorted text (operation 316).
Method 300 may be varied by including different operations, or by including additional operations, or by potentially using fewer operations. Some of these additional illustrative embodiments follow, and are shown in FIG. 3 as boxes surrounded dotted lines to indicate that they are optional additional steps taken with respect to method 300.
In one illustrative embodiment, method 300 may also include submitting the fields as entries into a unified database (operation 318). In addition to this operation illustrative embodiment, method 300 may also include receiving, prior to submitting, user input from a user input device indicating that the fields are correct (operation 320). A user could possibly edit the entries in the fields prior to submission. As an alternative to operation 320, method 300 may also include automatically taking an action, based on the entries in the unified database, required by an order stated in a document from which the image file was made (operation 322). An example of such an action would be, responsive to receiving a court order, withholding wages from an employee's paycheck and paying the withheld wages to a designated payee. Another example would be to populate a paystub and transmit the paystub to an employee or others authorized to receive the paystub. Many different actions are possible, and such actions are not necessarily limited to a human resources context.
Other variations are possible. For example, in one illustrative embodiment the first area and the second area are displayed side by side in the window, whereby use of multiple display windows is avoided. In another illustrative embodiment, the image comprises a plurality of images taken from a plurality of heterogeneous image files, and wherein auto-populating is performed for different sets of fields for each one of the plurality of heterogeneous image files.
In still another illustrative embodiment, displaying the image file and displaying the fields is performed on a web browser of a local computer, and wherein receiving, performing, identifying, comparing, sorting, and auto-populating are performed by a remote server as software as a service. In yet another illustrative embodiment, the computer-implemented method is performed on a single local computer.
Still other variations are possible. Thus, the illustrative embodiments described with respect to FIG. 3 do not necessarily limit the claimed inventions or the other examples described herein.
Turning now to FIG. 4, an illustration of a data processing system is depicted in accordance with an illustrative embodiment. Data processing system 400 in FIG. 4 is an example of a data processing system that may be used to implement the illustrative embodiments, such screenshot 100 of FIG. 1, screenshot 200 of FIG. 2, method 300 of FIG. 3, or any other module or system or process disclosed herein. In this illustrative example, data processing system 400 includes communications fabric 402, which provides communications between processor unit 404, memory 406, persistent storage 408, communications unit 410, input/output (I/O) unit 412, and display 414.
Processor unit 404 serves to execute instructions for software that may be loaded into memory 406. This software may be an associative memory, content addressable memory, or software for implementing the processes described elsewhere herein. Thus, for example, software loaded into memory 406 may be software for executing method 300 of FIG. 3. Processor unit 404 may be a number of processors, a multi-processor core, or some other type of processor, depending on the particular implementation. A number, as used herein with reference to an item, means one or more items. Further, processor unit 404 may be implemented using a number of heterogeneous processor systems in which a main processor is present with secondary processors on a single chip. As another illustrative example, processor unit 404 may be a symmetric multi-processor system containing multiple processors of the same type.
Memory 406 and persistent storage 408 are examples of storage devices 416. A storage device is any piece of hardware that is capable of storing information, such as, for example, without limitation, data, program code in functional form, and/or other suitable information either on a temporary basis and/or a permanent basis. Storage devices 416 may also be referred to as computer-readable storage devices in these examples. Memory 406, in these examples, may be, for example, a random-access memory or any other suitable volatile or non-volatile storage device. Persistent storage 408 may take various forms, depending on the particular implementation.
For example, persistent storage 408 may contain one or more components or devices. For example, persistent storage 408 may be a hard drive, a flash memory, a rewritable optical disk, a rewritable magnetic tape, or some combination of the above. The media used by persistent storage 408 also may be removable. For example, a removable hard drive may be used for persistent storage 408.
Communications unit 410, in these examples, provides for communications with other data processing systems or devices. In these examples, communications unit 410 is a network interface card. Communications unit 410 may provide communications through the use of either or both physical and wireless communications links.
Input/output (I/O) unit 412 allows for input and output of data with other devices that may be connected to data processing system 400. For example, input/output (I/O) unit 412 may provide a connection for user input through a keyboard, a mouse, and/or some other suitable input device. Further, input/output (I/O) unit 412 may send output to a printer. Display 414 provides a mechanism to display information to a user.
Instructions for the operating system, applications, and/or programs may be located in storage devices 416, which are in communication with processor unit 404 through communications fabric 402. In these illustrative examples, the instructions are in a functional form on persistent storage 408. These instructions may be loaded into memory 406 for execution by processor unit 404. The processes of the different embodiments may be performed by processor unit 404 using computer implemented instructions, which may be located in a memory, such as memory 406.
These instructions are referred to as program code, computer-usable program code, or computer-readable program code that may be read and executed by a processor in processor unit 404. The program code in the different embodiments may be embodied on different physical or computer-readable storage media, such as memory 406 or persistent storage 408.
Program code 418 is located in a functional form on computer-readable media 420 that is selectively removable and may be loaded onto or transferred to data processing system 400 for execution by processor unit 404. Program code 418 and computer-readable media 420 form computer program product 422 in these examples. In one example, computer-readable media 420 may be computer-readable storage media 424 or computer-readable signal media 426. Computer-readable storage media 424 may include, for example, an optical or magnetic disk that is inserted or placed into a drive or other device that is part of persistent storage 408 for transfer onto a storage device, such as a hard drive, that is part of persistent storage 408. Computer-readable storage media 424 also may take the form of a persistent storage, such as a hard drive, a thumb drive, or a flash memory, that is connected to data processing system 400. In some instances, computer-readable storage media 424 may not be removable from data processing system 400.
Alternatively, program code 418 may be transferred to data processing system 400 using computer-readable signal media 426. Computer-readable signal media 426 may be, for example, a propagated data signal containing program code 418. For example, computer-readable signal media 426 may be an electromagnetic signal, an optical signal, and/or any other suitable type of signal. These signals may be transmitted over communications links, such as wireless communications links, optical fiber cable, coaxial cable, a wire, and/or any other suitable type of communications link. In other words, the communications link and/or the connection may be physical or wireless in the illustrative examples.
In some illustrative embodiments, program code 418 may be downloaded over a network to persistent storage 408 from another device or data processing system through computer-readable signal media 426 for use within data processing system 400. For instance, program code stored in a computer-readable storage medium in a server data processing system may be downloaded over a network from the server to data processing system 400. The data processing system providing program code 418 may be a server computer, a client computer, or some other device capable of storing and transmitting program code 418.
The different components illustrated for data processing system 400 are not meant to provide architectural limitations to the manner in which different embodiments may be implemented. The different illustrative embodiments may be implemented in a data processing system including components in addition to or in place of those illustrated for data processing system 400. Other components shown in FIG. 4 can be varied from the illustrative examples shown. The different embodiments may be implemented using any hardware device or system capable of running program code. As one example, the data processing system may include organic components integrated with inorganic components and/or may be comprised entirely of organic components excluding a human being. For example, a storage device may be comprised of an organic semiconductor.
In another illustrative example, processor unit 404 may take the form of a hardware unit that has circuits that are manufactured or configured for a particular use. This type of hardware may perform operations without needing program code to be loaded into a memory from a storage device to be configured to perform the operations.
For example, when processor unit 404 takes the form of a hardware unit, processor unit 404 may be a circuit system, an application specific integrated circuit (ASIC), a programmable logic device, or some other suitable type of hardware configured to perform a number of operations. With a programmable logic device, the device is configured to perform the number of operations. The device may be reconfigured at a later time or may be permanently configured to perform the number of operations. Examples of programmable logic devices include, for example, a programmable logic array, programmable array logic, a field programmable logic array, a field programmable gate array, and other suitable hardware devices. With this type of implementation, program code 418 may be omitted because the processes for the different embodiments are implemented in a hardware unit.
In still another illustrative example, processor unit 404 may be implemented using a combination of processors found in computers and hardware units. Processor unit 404 may have a number of hardware units and a number of processors that are configured to run program code 418. With this depicted example, some of the processes may be implemented in the number of hardware units, while other processes may be implemented in the number of processors.
As another example, a storage device in data processing system 400 is any hardware apparatus that may store data. Memory 406, persistent storage 408, and computer-readable media 420 are examples of storage devices in a tangible form.
In another example, a bus system may be used to implement communications fabric 402 and may be comprised of one or more buses, such as a system bus or an input/output bus. Of course, the bus system may be implemented using any suitable type of architecture that provides for a transfer of data between different components or devices attached to the bus system. Additionally, a communications unit may include one or more devices used to transmit and receive data, such as a modem or a network adapter. Further, a memory may be, for example, memory 406, or a cache, such as found in an interface and memory controller hub that may be present in communications fabric 402.
The different illustrative embodiments can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment containing both hardware and software elements. Some embodiments are implemented in software, which includes but is not limited to forms such as, for example, firmware, resident software, and microcode.
Furthermore, the different embodiments can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any device or system that executes instructions. For the purposes of this disclosure, a computer-usable or computer-readable medium can generally be any tangible apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
The computer-usable or computer-readable medium can be, for example, without limitation an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, or a propagation medium. Non-limiting examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk, and an optical disk. Optical disks may include compact disk-read only memory (CD-ROM), compact disk-read/write (CD-R/W), and DVD.
Further, a computer-usable or computer-readable medium may contain or store a computer-readable or computer-usable program code such that when the computer-readable or computer-usable program code is executed on a computer, the execution of this computer-readable or computer-usable program code causes the computer to transmit another computer-readable or computer-usable program code over a communications link. This communications link may use a medium that is, for example without limitation, physical or wireless.
A data processing system suitable for storing and/or executing computer-readable or computer-usable program code will include one or more processors coupled directly or indirectly to memory elements through a communications fabric, such as a system bus. The memory elements may include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some computer-readable or computer-usable program code to reduce the number of times code may be retrieved from bulk storage during execution of the code.
Input/output or I/O devices can be coupled to the system either directly or through intervening I/O controllers. These devices may include, for example, without limitation, keyboards, touch screen displays, and pointing devices. Different communications adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Non-limiting examples of modems and network adapters are just a few of the currently available types of communications adapters.
The description of the different illustrative embodiments has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. Further, different illustrative embodiments may provide different features as compared to other illustrative embodiments. The embodiment or embodiments selected are chosen and described in order to best explain the principles of the embodiments, the practical application, and to enable others of ordinary skill in the art to understand the disclosure for various embodiments with various modifications as are suited to the particular use contemplated.

Claims (20)

What is claimed is:
1. A method comprising:
automatically entering, by a processor, information into a unified database from heterogenous documents by a computer:
receiving, by the processor, an image file, the image file including text;
displaying, by the processor, the image file in a first area of a window, wherein the window is rendered on a tangible display device;
displaying, by the processor, fields for data entry in a second area of the window;
performing, by the processor, optical character recognition on the image file;
identifying, by the processor, at least one parameter of the text in the image file;
comparing, by the processor, the at least one parameter of the text to employee information stored in the unified database, the employee information comprising a plurality of stored parameters about human resources records, payroll records, and tax information;
sorting, by the processor, the text according to the at least one of the plurality of stored parameters into a plurality of categories, wherein sorted text is formed; and
auto-populating and displaying, by the processor, the fields for data entry in the second area of the window based on the sorted text; and
after automatically entering information into the unified database, automatically taking, by the processor, an action:
based on the information entered into the unified database; and
as required by an order stated in a document from which the image file was made, wherein the order is for at least one of withholding wages from an employee's paycheck, or paying withheld wages to a designated payee.
2. The method of claim 1 further comprising:
submitting, by the processor, the fields for data entry as entries into a unified database.
3. The method of claim 2 further comprising:
receiving, by the processor, prior to submitting, user input from a user input device indicating that the fields for data entry are correct.
4. The method of claim 2, wherein the order is a court order.
5. The method of claim 1, wherein the first area and the second area are displayed side by side in the window, whereby use of multiple display windows is avoided.
6. The method of claim 1, wherein the image file comprises a plurality of images taken from a plurality of heterogeneous image files, and wherein auto-populating is performed for different sets of fields for each one of the plurality of heterogeneous image files.
7. The method of claim 1, wherein displaying the image file and displaying the fields for data entry are performed on a web browser of a local computer, and wherein receiving, performing, identifying, comparing, sorting, and auto-populating are performed by a remote server as software as a service.
8. The method of claim 1, wherein the computer is a single local computer.
9. A computer program product, comprising:
a non-transitory computer-recordable storage medium comprising instructions, wherein when executed by a processor, the instructions automatically enter information into a unified database from heterogenous documents, the instructions comprising:
first program code for receiving an image file, the image file including text;
second program code for displaying the image file in a first area of a window, wherein the window is rendered on a tangible display device;
third program code for displaying fields for data entry in a second area of the window;
fourth program code for performing optical character recognition on the image file;
fifth program code for identifying at least one parameter of the text in the image file;
sixth program code for comparing the at least one parameter of the text to employee information stored in the unified database, the employee information comprising plurality of stored parameters about human resources records, payroll records, and tax information;
seventh program code for sorting the text according to the at least one of the plurality of stored parameters into a plurality of categories, wherein sorted text is formed;
eighth program code for auto-populating and displaying the fields for data entry in the second area of the window based on the sorted text; and
ninth program code for, after auto-populating, automatically taking an action:
based on information included in the auto-populating; and
as required by an order stated in a document from which the image file was made, wherein the order relates to disposition of wages of an employee.
10. The computer program product of claim 9, wherein the instructions further comprise:
tenth program code for submitting the fields for data entry as entries into a unified database.
11. The computer program product of claim 10, wherein the instructions further comprise:
eleventh program code for receiving, prior to submitting, user input from a user input device indicating that the fields for data entry are correct.
12. The computer program product of claim 9, wherein the order is a court order for at least one of:
withholding wages from a paycheck of the employee; or
paying withheld wages to a designated payee.
13. The computer program product of claim 9 wherein the instructions are configured to display the first area and the second area in a side by side arrangement in the window.
14. The computer program product of claim 9, wherein the image file comprises a plurality of images taken from a plurality of heterogeneous image files, and wherein the auto-populating is executed for different sets of fields for each one of the plurality of heterogeneous image files.
15. A computer comprising:
a processor; and
a non-transitory computer-recordable storage medium comprising instructions, which when executed by the processor, automatically enters information into a unified database from heterogenous documents, the instructions comprising:
first program code for receiving an image file, the image file including text;
second program code for displaying the image file in a first area of a window rendered on a tangible display device;
third program code for displaying fields for data entry in a second area of the window;
fourth program code for performing optical character recognition on the image file;
fifth program code for identifying at least one parameter of the text in the image file;
sixth program code for comparing the at least one parameter of the text to employee information stored in the unified database, the employee information comprising plurality of stored parameters about human resources records, payroll records, and tax information;
seventh program code for sorting the text according to the at least one of the plurality of stored parameters into a plurality of categories, wherein sorted text is formed;
eighth program code for auto-populating and displaying the fields for data entry in the second area of the window based on the sorted text; and
ninth program code for, after auto-populating, automatically taking an action:
based on information included in the auto-populating; and
as required by an order stated in a document from which the image file was made, wherein the order is a court order.
16. The computer of claim 15, wherein the instructions further comprise:
tenth program code for submitting the fields for data entry as entries into a unified database.
17. The computer of claim 16, wherein the instructions further comprise:
eleventh program code for receiving, prior to submitting, user input from a user input device indicating that the fields for data entry are correct.
18. The computer of claim 15, wherein the court order is for at least one of:
withholding wages from an employee's paycheck; or
paying withheld wages to a designated payee.
19. The computer of claim 15, wherein the instructions are configured to display the first area and the second area in a side by side arrangement in the window.
20. The computer of claim 15, wherein the image file comprises a plurality of images taken from a plurality of heterogeneous image files, and wherein the auto-populating is executed for different sets of fields for each one of the plurality of heterogeneous image files.
US15/821,682 2017-11-22 2017-11-22 Methods and devices for enabling computers to automatically enter information into a unified database from heterogeneous documents Active US10817656B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/821,682 US10817656B2 (en) 2017-11-22 2017-11-22 Methods and devices for enabling computers to automatically enter information into a unified database from heterogeneous documents

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US15/821,682 US10817656B2 (en) 2017-11-22 2017-11-22 Methods and devices for enabling computers to automatically enter information into a unified database from heterogeneous documents

Publications (2)

Publication Number Publication Date
US20190155887A1 US20190155887A1 (en) 2019-05-23
US10817656B2 true US10817656B2 (en) 2020-10-27

Family

ID=66533069

Family Applications (1)

Application Number Title Priority Date Filing Date
US15/821,682 Active US10817656B2 (en) 2017-11-22 2017-11-22 Methods and devices for enabling computers to automatically enter information into a unified database from heterogeneous documents

Country Status (1)

Country Link
US (1) US10817656B2 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110555372A (en) * 2019-07-22 2019-12-10 深圳壹账通智能科技有限公司 Data entry method, device, equipment and storage medium
US20220076208A1 (en) * 2020-09-04 2022-03-10 Scopeasy Construction Software Limited Methods and systems for processing training records and documents of employees

Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030112270A1 (en) * 2000-12-22 2003-06-19 Merchant & Gould, P.C. Litigation management system and method
US20030179400A1 (en) * 2002-03-22 2003-09-25 Intellectual Property Resources, Inc. Data capture during print process
US20040181749A1 (en) * 2003-01-29 2004-09-16 Microsoft Corporation Method and apparatus for populating electronic forms from scanned documents
US6886136B1 (en) 2000-05-05 2005-04-26 International Business Machines Corporation Automatic template and field definition in form processing
US6898316B2 (en) * 2001-11-09 2005-05-24 Arcsoft, Inc. Multiple image area detection in a digital image
US6950553B1 (en) 2000-03-23 2005-09-27 Cardiff Software, Inc. Method and system for searching form features for form identification
US7069240B2 (en) * 2002-10-21 2006-06-27 Raphael Spero System and method for capture, storage and processing of receipts and related data
US7103198B2 (en) * 2002-05-06 2006-09-05 Newsoft Technology Corporation Method for determining an adjacency relation
US20070168382A1 (en) 2006-01-03 2007-07-19 Michael Tillberg Document analysis system for integration of paper records into a searchable electronic database
US20090044095A1 (en) * 2007-08-06 2009-02-12 Apple Inc. Automatically populating and/or generating tables using data extracted from files
US20090132605A1 (en) * 2007-04-19 2009-05-21 2C Change A/S Handling of data in a data sharing system
US20090208103A1 (en) * 2007-04-22 2009-08-20 Bo-In Lin Control of optical character recognition (OCR) processes to generate user controllable final output documents
US7729928B2 (en) * 2005-02-25 2010-06-01 Virtual Radiologic Corporation Multiple resource planning system
US20100138343A1 (en) * 2007-12-31 2010-06-03 Bank Of America Corporation Dynamic hold decisioning
US7974877B2 (en) * 2005-06-23 2011-07-05 Microsoft Corporation Sending and receiving electronic business cards
US20120040717A1 (en) * 2010-08-16 2012-02-16 Veechi Corp Mobile Data Gathering System and Method
US20120166206A1 (en) * 2010-12-23 2012-06-28 Case Commons, Inc. Method, computer readable medium, and apparatus for constructing a case management system
US20140219583A1 (en) * 2011-06-07 2014-08-07 Amadeus S.A.S. Personal information display system and associated method
US20170109610A1 (en) * 2013-03-13 2017-04-20 Kofax, Inc. Building classification and extraction models based on electronic forms
US9753908B2 (en) * 2007-11-05 2017-09-05 The Neat Company, Inc. Method and system for transferring data from a scanned document into a spreadsheet
US10558880B2 (en) * 2015-11-29 2020-02-11 Vatbox, Ltd. System and method for finding evidencing electronic documents based on unstructured data

Patent Citations (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6950553B1 (en) 2000-03-23 2005-09-27 Cardiff Software, Inc. Method and system for searching form features for form identification
US6886136B1 (en) 2000-05-05 2005-04-26 International Business Machines Corporation Automatic template and field definition in form processing
US20030112270A1 (en) * 2000-12-22 2003-06-19 Merchant & Gould, P.C. Litigation management system and method
US6898316B2 (en) * 2001-11-09 2005-05-24 Arcsoft, Inc. Multiple image area detection in a digital image
US20030179400A1 (en) * 2002-03-22 2003-09-25 Intellectual Property Resources, Inc. Data capture during print process
US7103198B2 (en) * 2002-05-06 2006-09-05 Newsoft Technology Corporation Method for determining an adjacency relation
US7069240B2 (en) * 2002-10-21 2006-06-27 Raphael Spero System and method for capture, storage and processing of receipts and related data
US20040181749A1 (en) * 2003-01-29 2004-09-16 Microsoft Corporation Method and apparatus for populating electronic forms from scanned documents
US7305129B2 (en) 2003-01-29 2007-12-04 Microsoft Corporation Methods and apparatus for populating electronic forms from scanned documents
US7729928B2 (en) * 2005-02-25 2010-06-01 Virtual Radiologic Corporation Multiple resource planning system
US7974877B2 (en) * 2005-06-23 2011-07-05 Microsoft Corporation Sending and receiving electronic business cards
US20070168382A1 (en) 2006-01-03 2007-07-19 Michael Tillberg Document analysis system for integration of paper records into a searchable electronic database
US20090132605A1 (en) * 2007-04-19 2009-05-21 2C Change A/S Handling of data in a data sharing system
US20090208103A1 (en) * 2007-04-22 2009-08-20 Bo-In Lin Control of optical character recognition (OCR) processes to generate user controllable final output documents
US20090044095A1 (en) * 2007-08-06 2009-02-12 Apple Inc. Automatically populating and/or generating tables using data extracted from files
US9753908B2 (en) * 2007-11-05 2017-09-05 The Neat Company, Inc. Method and system for transferring data from a scanned document into a spreadsheet
US20100138343A1 (en) * 2007-12-31 2010-06-03 Bank Of America Corporation Dynamic hold decisioning
US20120040717A1 (en) * 2010-08-16 2012-02-16 Veechi Corp Mobile Data Gathering System and Method
US20120166206A1 (en) * 2010-12-23 2012-06-28 Case Commons, Inc. Method, computer readable medium, and apparatus for constructing a case management system
US20140219583A1 (en) * 2011-06-07 2014-08-07 Amadeus S.A.S. Personal information display system and associated method
US20170109610A1 (en) * 2013-03-13 2017-04-20 Kofax, Inc. Building classification and extraction models based on electronic forms
US10558880B2 (en) * 2015-11-29 2020-02-11 Vatbox, Ltd. System and method for finding evidencing electronic documents based on unstructured data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Denoue et al., "FormCracker: Interactive Web-based Form Filling," DocEng2010, Sep. 21-24, 2010, Manchester, United Kingdom, 4 pages.

Also Published As

Publication number Publication date
US20190155887A1 (en) 2019-05-23

Similar Documents

Publication Publication Date Title
US10546351B2 (en) System and method for automatic generation of reports based on electronic documents
US10783367B2 (en) System and method for data extraction and searching
US10354000B2 (en) Feedback validation of electronically generated forms
US10366123B1 (en) Template-free extraction of data from documents
US10013411B2 (en) Automating data entry for fields in electronic documents
US11810070B2 (en) Classifying digital documents in multi-document transactions based on embedded dates
US20110052075A1 (en) Remote receipt analysis
US11625660B2 (en) Machine learning for automatic extraction and workflow assignment of action items
US20110166934A1 (en) Targeted advertising based on remote receipt analysis
US20150186739A1 (en) Method and system of identifying an entity from a digital image of a physical text
US9256805B2 (en) Method and system of identifying an entity from a digital image of a physical text
US20180011846A1 (en) System and method for matching transaction electronic documents to evidencing electronic documents
US10679230B2 (en) Associative memory-based project management system
CN111914729A (en) Voucher association method and device, computer equipment and storage medium
US10817656B2 (en) Methods and devices for enabling computers to automatically enter information into a unified database from heterogeneous documents
CN115809653A (en) Intelligent contract auditing method and system
US10942963B1 (en) Method and system for generating topic names for groups of terms
US20170148033A1 (en) Preventing restricted trades using physical documents
US20170147978A1 (en) Executing shipments based on physical trade documents
WO2017033200A1 (en) Electronic sorting and classification of documents
CN115471228A (en) Financial business certificate checking method, device, equipment and storage medium
CN115880703A (en) Form data processing method and device, electronic equipment and storage medium
US11093899B2 (en) Augmented reality document processing system and method
KR20200045041A (en) Method for Managing Integration Welfare Support for the Low-income Independents
US20240143642A1 (en) Document Matching Using Machine Learning

Legal Events

Date Code Title Description
AS Assignment

Owner name: ADP, LLC, NEW JERSEY

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KUTTY, SANJAY;HONGGUO, AN;VINNAKOTA, SUBHASH C.;AND OTHERS;REEL/FRAME:044203/0108

Effective date: 20171121

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

FEPP Fee payment procedure

Free format text: PETITION RELATED TO MAINTENANCE FEES GRANTED (ORIGINAL EVENT CODE: PTGR); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FEPP Fee payment procedure

Free format text: PETITION RELATED TO MAINTENANCE FEES GRANTED (ORIGINAL EVENT CODE: PTGR); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: ADP, INC., NEW JERSEY

Free format text: CHANGE OF NAME;ASSIGNOR:ADP, LLC;REEL/FRAME:058959/0729

Effective date: 20200630

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4