US9165045B2 - Classifying information captured in different formats for search and display - Google Patents

Classifying information captured in different formats for search and display Download PDF

Info

Publication number
US9165045B2
US9165045B2 US14/054,316 US201314054316A US9165045B2 US 9165045 B2 US9165045 B2 US 9165045B2 US 201314054316 A US201314054316 A US 201314054316A US 9165045 B2 US9165045 B2 US 9165045B2
Authority
US
United States
Prior art keywords
medical documents
information
data
format
common
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
US14/054,316
Other versions
US20140046931A1 (en
Inventor
Megan Mok
R. David Holvey
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
PeopleChart Corp
Original Assignee
PeopleChart Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by PeopleChart Corp filed Critical PeopleChart Corp
Priority to US14/054,316 priority Critical patent/US9165045B2/en
Publication of US20140046931A1 publication Critical patent/US20140046931A1/en
Application granted granted Critical
Publication of US9165045B2 publication Critical patent/US9165045B2/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F17/30557
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • G06F17/30554
    • G06F19/322
    • G06F19/3443
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Definitions

  • Particular embodiments generally relate to a document management system.
  • information is likely sourced from different data providers that create and store information in their own formats, such as in paper or electronically in a computer system.
  • patient information is migrating toward an increasing use of electronic systems that can store and organize information within a structured database framework as defined by data tables, fields, and values. Mapping or describing information into named fields and defining the relationship of these fields with each other inside a database structure enables the computer system to identify specific data, recognize the difference between one data field from another, and perform analytical tasks such as queries, calculations, and algorithmic functions.
  • a flexible data structure also allows for compilation of information to be organized in different topics and presented in different reports. Structure can be very simple to very complex.
  • Particular embodiments generally relate to the process of combining and organizing information that was originally created in disparate formats.
  • One type of data is created and saved in paper format and includes examples from handwritten notes, text typed on paper, a body of an email message, and word-processing documents.
  • the content of a paper document is converted to an electronic image.
  • the conversion may use scanning technology or other recognition technology.
  • the content of an electronic image cannot be recognized or understood by the computer as being any different from another electronic image.
  • the image data is considered unstructured data.
  • Another type of data format is a stream of data transmitted in electronic format, where the data values are identified and separated by delimiters and recognized by the computer system as those pertaining to specific data fields (machine-readable data).
  • This format can also pertain to data values already defined and stored in a database.
  • a third type is self-entered forms that capture data typed or manually selected from a pull-down list of data choices by the user.
  • This latter format is similar to the machine-readable format in that the data fields are already defined in the database. Both of these data formats may be considered structured data.
  • a plurality of documents are received from different data providers, such as from healthcare providers, where the medical documents can be captured in paper, electronically transmitted in parsed, delimited format such as data stream from a diagnostic center to a hospital, or self-entered data such as the physician or patient typing information into a computer system.
  • Other examples include audio recordings, video clips, the body of an email message, and word-processing documents.
  • structured and unstructured data For the purpose of illustrating the difference between structured and unstructured data, specifically paper-derived, particular embodiments focus between scanned images (from the paper documents) and machine-readable data that is identified or delimited into recognizable database fields. It will be acknowledged that structured and unstructured data may be represented in other forms. These two very different formats are then converted to a common format.
  • the different types of data formats are converted to a common format, which is then stored in the database with appropriate indices.
  • the image data is tagged (indexed) with various descriptors selected from categories that are the equivalent of data fields, where eventually, these categories can be further rolled up or mapped into broadly defined or higher-level categories such as by topics or sections of a report.
  • the electronically transmitted data which is already pre-defined by data fields, can be compiled and presented into a rendered report similar to that of a paper document. Together, paper and electronically-transmitted data co-exist in a common format that is identified by similar data labels and fields, and is recognizable and distinguishable by a computer system. Having a common format enables the simultaneous searching of different data formats and the presentation of searched results in a single, organized view.
  • the search is performed with the use of images. For example, a user may choose to search by one or more search categories and the images created from either paper documents or structured data may be returned as results. Moreover, the resulting images from either paper or structured formats may now be displayed together in the same organization schema, folder, or webpage view.
  • a common format is defined as the most constraining format between the two formats.
  • this may be an image format, which is an unstructured format.
  • the second documents in structured format are converted to unstructured (image) data format.
  • a schema is a method for organizing a plurality of data categories and super-categories along which unstructured data (document images) and structured data (electronically delimited data) can be classified.
  • the schema includes a list of categories presented in an organized sequence or in order of importance. It provides a directory or a way to organize or group various data classifications that can range from very narrow to broad, for example, by the details of data fields to broadly defined topics, report sections, or report types.
  • the schema enables presentation of both paper images and electronically transmitted data in a single view that is easy to understand, searchable, and selectable or re-classifiable (into a folder or report).
  • Data classification may be included in the content of both types of data formats, such as author name, creation date, medical organization that provided the document, patient name about whom information is created, diagnosis, or medical specialization for which the content references.
  • These indices or text descriptors offer a way to classify the content on the paper image document and at the same time are applicable to the data fields associated with the electronically transmitted data. Further separation or roll-up of categories for describing or tagging the specific content found in both type of document formats are determined or predetermined by the classification that make sense or that are commonly found in both types of document formats. These sub-categories are then grouped and rolled up to an organizing schema or principle for organizing documents. An image document may be tagged with one or more indices if the paper image contains content that is determined to match one or more categories.
  • FIG. 1 depicts a simplified system for consolidating and organizing documents according to one embodiment.
  • FIG. 2 depicts an example of converting unstructured data and structured data into a common format according to one embodiment.
  • FIG. 3 depicts an example of structured data delimited by vertical lines that represent the separation of different data fields.
  • FIG. 4 depicts a table that specifies sub-categories for report groups according to one embodiment.
  • FIG. 5 shows an example of the use of indices for tagging or identifying image data that can be stored as data categories in the database according to one embodiment.
  • FIG. 6 shows an interface that can be used to index images according to one embodiment.
  • FIG. 7 shows an interface that can be used to search for medical records according to one embodiment.
  • FIG. 8A depicts an example of an interface of search results from reviewing different formats of data provided by different sources according to one embodiment.
  • FIG. 8B shows an example of an image-based report generated using structured data according to one embodiment.
  • FIG. 8C shows an example of a view of the structured data displayed across time on a graph or chart according to one embodiment.
  • FIG. 9 depicts a simplified flowchart of a method for indexing documents according to one embodiment.
  • FIG. 1 depicts a simplified system 100 for consolidating and organizing documents according to one embodiment.
  • System 100 includes a document manager 102 , one or more clients 104 , and a database 106 . It will be understood that certain elements of system 100 have not been shown, such as networks, other computing devices, etc.
  • Document manager 102 is configured to receive documents created in different formats.
  • medical records may be received from different medical providers.
  • a personal health record may be managed by a patient.
  • a patient may receive medical care from multiple medical providers who in turn create documents about the episode or portion of care that they had delivered or provided the patient.
  • a patient may consolidate the records in different formats received from a multitude of healthcare providers inside of one system, where the system enables the medical information to be searched, queried, reported, and viewed from a web-based system or from one place.
  • the PHR model provides for comprehensive and portable medical records that are directed by the patient.
  • Client 104 may include a computing device that is used by a user.
  • the user may want to access different documents in their personal health record. For example, a user may query for documents for a different disease. These documents may have been generated by different medical providers and thus may have been created in different formats.
  • Particular embodiments allow the user to display and search for documents that originated in different formats in a more integrated way inside a single computer system or across a multitude of systems.
  • the documents may be received in different formats because different medical providers generate the documents in different ways or media. For example, different medical providers may use different systems that create and generate documents in different formats. For example, depending on the type of care and standard of documentation, different documents may be generated in different formats, including physician's notes in paper-based format, x-rays in digital format, medication history transmitted in delimited format, or messages dictated onto word processing documents. Other formats may be contemplated.
  • a first format is an unstructured data format.
  • Unstructured data is information that cannot be organized into a structure of database tables with descriptive columns and rows of records. For example, in the case of a scanned image of a paper document, the information is not annotated (tagged). In other words, the information is not identifiable or discernible to a computer system.
  • unstructured data may be image-based data, such as a bit map object.
  • textural objects may be unstructured data, such as word processing documents, e-mails, etc.
  • the characteristic of unstructured data is that content displayed on the document cannot easily be read and analyzed by a machine. For example, in its original state, the content of an image of unstructured data cannot be recognized or understood by the computer as being any different from content of another image.
  • Structured data is in a form where the information can be easily manipulated to generate different reports and can be easily searched. Structured data has an enforced composition to the different types of data in the data structure and this allows for querying and reporting against the data types.
  • the structured data can be manipulated easily to generate different types of documents.
  • unstructured data such as an image
  • the information captured by the image such as the doctor's name, handwritten or typed notes on the document, is not stored or identified as data fields of a database.
  • the image is not searchable (i.e., a search for documents with the doctor's name would not yield the correct image of the document if the image is not identified in the database).
  • Particular embodiments take the documents stored in different formats and determine a common format in which to store the documents. For example, a format that is most constraining the formats being used in terms of the ability to name, sort, and parse data by variables may be determined. For example, if the most constraining format is the image format, then both types of documents may be converted to the image format. However, if the most constraining format is structured data, then documents may be stored in the structured data format.
  • the common format is an image-based format, which is in unstructured state.
  • Particular embodiments may refer to documents in the common format as electronic images for discussion purposes. However, it will be understood that other formats may be used as the common format even though images is used for discussion purposes.
  • Data received that is in a structured data format may be converted to an unstructured state. For example, an electronic image of a report may be generated from the structured data.
  • This may be counterintuitive in that most users desire that data is stored in the structured way because of the flexibility and power in manipulating the structured data.
  • particular embodiments want to allow a user to search and sort documents that may have originated in different formats. By converting to a common format, the users can simultaneously search, sort, and view documents that may have originated in the different formats even if some advantages of using structured data is lost.
  • a schema is used to organize and index document images.
  • the user can search for documents that were originally presented in varied formats.
  • two formats, unstructured and structured are discussed, it will be recognized that different degrees may be contemplated.
  • some documents may have aspects of both unstructured and structured, such as an electronic form that includes data fields and images.
  • structured data in the form of data fields and unstructured data in the form of scanned images (of paper documents) may have been separately stored in a state where it is not convenient or possible for a user to search through both formats at the same time. Rather, a single search of the structured data may have been performed and then a separate search of the unstructured data may have been performed. Also, when a user wanted to display documents, the documents for structured data and unstructured data were usually not displayed together in a single view, rather in different tabs or in different web pages. However, particular embodiments do allow for the consolidation and display of documents that were originally created in different formats.
  • the schema is used to organize images of both structured and unstructured data in common categories, such that they can be searched and displayed together.
  • FIG. 2 depicts an example of converting unstructured data (image documents) and structured data into a common format according to one embodiment.
  • unstructured data 202 and structured data 204 are being processed by a document converter 206 .
  • Document converter 206 is configured to convert unstructured documents 202 and structured data 204 into a common format and organized into a common schema.
  • structured data 204 may be received as a stream of text data with each field and its respective value separated by a delimiter or vertical pipeline, as depicted in FIG. 3 .
  • the streamed data may be organized by multiple different fields with each field values eventually mapped into a data structure.
  • Unstructured documents 202 may be any type of unstructured data that is received. For example, images of paper documents may be received or paper documents may be received and scanned into images. In this case, document converter 206 converts unstructured documents 202 into images. If unstructured documents 202 are already in image format, then a conversion is not performed. To convert structured data 204 to images, structured data 204 may be retrieved from different fields of the database. An image report is generated from an aggregation of multiple data values retrieved from the database. The report shows an image of data values on a page. As an image in its basic form, its content is no longer recognizable by a computer.
  • a schema 208 is used to organize and categorize the image data.
  • the schema is an organizational schema or framework that includes categories from which the images may be organized. Schema 208 may be determined based on expected content that may be included in the images. For example, medical records may have specific information that is included in them, such as a doctor's name, address, diagnosis, prescription, or other categories may usually be found in medical documents and these categories are included in schema 208 . Accordingly, an effective organizational schema may be determined in which to classify the images.
  • Schema 208 is applied to indexing or tagging the document by the various categories. For example, some categories include an author's name, author date, type of page (page categories), and any sub-categories that may be determined.
  • indexing is for a category that includes a doctor's name or ID, any image that includes that doctor's name or ID is indexed with that category. For example, an image identifier may be tagged with that category for the doctor's name or ID.
  • the schema is applied to all images and indices 210 are then generated. The indices may be stored with the images in database 106 .
  • structured and unstructured data co-exist in a common format that is identified by similar data labels and fields, and is recognizable and distinguishable by a computer system.
  • This allows searching of images for both document formats simultaneously. For example, a user may search for one of the categories and images created from either unstructured or structured data may be returned. Also, images for both formats of documents may now be displayed simultaneously in the same organization schema, folder, or webpage (view).
  • the schema may organize the data in report groups, which are higher level categories. Inside the report groups are section headers that are sub-categories.
  • FIG. 4 discloses an interface that specifies report groups according to one embodiment.
  • a table 400 specifies report groups 401 that have been created for the schema.
  • the report groups 401 may be categories where groups of documents received by a patient can be categorized. For example, in a database, the report group Advanced Health Care Directive is shown as one of the report group categories.
  • a code can be used to identify each report group found in the schema similar to a node on a tree diagram. The code may also be used to determine how to display the report groups. For example, a lower code may cause a report group to be displayed before a higher coded report group.
  • a column 402 shows the name of various sub-categories that are mapped into a report group 401 .
  • the sub-categories are determined based on various data that may be received in each report group. For example, different medical providers may provide different documents to a patient (i.e., in a different format). The documents from different medical providers, however, may be categorized into one of the sub-categories. Any of the report groups ( 401 ) and sub-categories ( 402 ) can be indices 210 .
  • Schema 208 is then used to index images.
  • unstructured data may be given some structure to allow for searching and displaying of images.
  • structured data 204 was already in a format that could be searched and displayed, to integrate unstructured documents 202 and structured data 204 , structured data 204 is converted to the image-based format, which is a more constraining format. That is, an image inherently does not have any structured data to it.
  • a common schema 208 is applied to index the images from unstructured documents 202 and structured data 204 to allow integrated searching of both.
  • Indices 210 may be stored in a database as field names.
  • FIG. 5 shows an example of a database table that can be used to store indices 210 according to one embodiment.
  • an image 500 includes content 502 .
  • Content 502 - 1 - 502 - 4 may be a document name, Author, Date and Doctor's notes.
  • This content may be tagged with indices.
  • the document ID may be stored as a row in a table 510 .
  • An identifier 512 may be stored to identify image 500 .
  • Indices 514 are provided in the columns of table 510 .
  • Table 510 may be populated with content from the image or may be organized by category descriptors.
  • image 500 the fields of the table are filled with data based on the content of the image. For example, for index 514 - 1 , an image's name is inserted into the corresponding data field. Also, the Document name, Author, Author Date, Document Type, and Source may be inserted into the other corresponding fields for indices 514 - 1 to 514 - 5 .
  • Table 510 may also include category descriptors that are used to organize the image. For example, the image may fall into different categories based on the content of the image, where the image originated, what medical condition the image is diagnosing, etc. Table 510 may insert information for categories for image 500 , such as the document may be associated with the doctor's notes category and the image is tagged in that category. Other categories may or may not be tagged depending on image 500 .
  • FIG. 6 shows an interface 600 that can be used to index images according to one embodiment.
  • An image 602 is shown that is being indexed.
  • An index section 604 is used to commit and apply attributes or descriptions of the image document 602 in the form of indices.
  • entry boxes 606 are used to receive information that can be used to index image 602 .
  • a name 608 and source 610 is used to identify the doctor by name and also the source of where image 602 is received from.
  • a category 612 is used to categorize image 602 .
  • the categories may be used to index image 602 based on the report groups and sub-categories that were described with respect to FIGS. 4A and 4B .
  • image 602 is indexed.
  • Image 602 may be indexed manually or automatically.
  • index section 604 may be used to provide a template for automatically indexing other images.
  • other images can be automatically indexed using the template.
  • similar documents such as images from the same doctor may be automatically indexed.
  • FIG. 7 shows an interface 700 that can be used to search for medical records according to one embodiment.
  • different categories 702 may be used to search for documents.
  • the categories immunizations, medications and allergies, behavioral health, cardiac electrophysiology, cardiac electroscopy, and cardiology have been selected.
  • all images that have been indexed with these categories may be retrieved from database 106 .
  • searches may be performed over images that originated from unstructured documents 202 and structured data 204 . Separate searches do not need to be performed for the two types of documents.
  • the schema may be organized by different report groups. For example, different categories of the schema are included in a report group. That is, for a report group Medication and Allergies, the data may be further tagged by specific medical specialties reflecting different diseases categories 704 , such as Allergy and Immunology, Anesthesiology, Audiology, Behavioral Health, etc. Any documents tagged with these sub-categories may be searched for and retrieved if the report group hospitalization is used.
  • the organizational schema thus provides some structure with how the images are organized.
  • FIG. 8A depicts an example of an interface 800 including search results according to one embodiment.
  • the images may be images that were generated from documents of different formats.
  • physician notes 804 may be images of paper-based notes.
  • an image 806 may be an image of structured data relating to a record of a hospitalization.
  • an image 808 is an image of notes for the hospitalization.
  • a user can see different images for different documents under the same report group. For example, all physician notes are categorized together and all images for hospital and surgery are categorized together.
  • electronic hospitalization notes would have been displayed in a different category than paper-based hospitalization notes.
  • a preview panel 810 shows images of documents.
  • physician notes 804 are shown, which are mostly composed of handwritten notes and paper-based images. A user may select the different physician notes and have them be displayed.
  • images of documents that originated in different formats may also be included.
  • preview panel 812 or 804 a document originally populated by structured data as its content is displayed as an image report along side an image of another document that originally was created on paper.
  • Interface 800 can be used to view structured and unstructured data and access the benefits of structured data stored in its defined way for greater data manipulation.
  • a link 814 e.g., the link to View Trend Data
  • interface 800 can be used to view structured and unstructured data and access the benefits of structured data stored in its defined way for greater data manipulation.
  • a link 814 e.g., the link to View Trend Data
  • structured data is retrieved and can be displayed in a timeline or graphical way.
  • a report image may be rendered or generated from the structured data that corresponds to one of the images.
  • FIG. 8B shows an example of an image of a report generated using structured data according to one embodiment. As shown, a test panel is shown. This image is unstructured data in that content found in the image cannot be distinguished by a computer system.
  • the schema was used to index the image and it has been retrieved in response to the query received from interface 700 of FIG. 7 .
  • link 814 When link 814 is selected, a view of the structured data is displayed across time on a graph or chart in FIG. 8C as compared with the rendered snapshot of the data provided by the report image.
  • structured data that is associated with the image of FIG. 8B is retrieved.
  • the structured data is then used to generate a report as shown.
  • the report may be different from the image if different analytics are desired. However, the report may show the same information as the image; however, it is not in an image format. This may allow further manipulation of the report, such as keyword searching, editing, etc.
  • FIG. 9 depicts a simplified flowchart of a method for indexing different data formats according to one embodiment.
  • Step 902 determines a common format.
  • a common format is derived from reviewing the most constrained of formats.
  • the different formats of structured and unstructured data are reviewed and where the most constraining format is selected to become the common format for both types of data. For example, if the only documents to be indexed are structured data, then the common format may be the structured data format. However, if images are to be indexed, then the most constraining common-denominator format is the image-based format.
  • the formats of documents may be analyzed and the common format is determined automatically.
  • Step 904 determines the indices to be used for tagging the structured and unstructured-based images.
  • Indices are chosen after reviewing the organizing principle to which a common set of descriptors can be identified to tag and organize the images such that they can all be searched and sorted together.
  • the structured data is tagged by indices and roll-up to image reports along which an organizing schema emerges that can apply to both structured and unstructured-based images.
  • the structured data is parsed into images that are most relevant to the categories of the organizing schema. For example, if a doctor's name is included in the structured data and used to create the image, the image may be indexed with a tag for the doctor's name.
  • Step 906 compiles or separates structured data into individual image reports that can be described by the indices.
  • the way that structured data is parsed and compiled into individual images is determined by both the nature of the content and the roll-up categories of the common schema from which to apply the organization across all resulting images. For example, the content is analyzed and a report that is considered to represent the data in the most useful manner is determined based on different factors, such as user preferences, conversion rules, etc. Also, the content of the image may be determined based on different categories that could be applied.
  • Step 908 reviews unstructured data for tagging and indexing.
  • Step 910 uses the schema 208 to index the unstructured data with the same indices as those for images generated from the structured data.
  • the challenge is to apply the right tags or indices for describing the content of an image that is not recognizable or identifiable by a computer system.
  • Particular embodiments provide certain techniques that may be used to index the images. For example, optical character recognition may be performed on the image to determine information from the content of the image. Also, an operator or user may review the image and enter the information. Other methods of extracting information from the unstructured data may be performed. When the information is extracted, it may be matched with categories in schema 208 . For example, if a doctor's name is recognized in an image, the image may be tagged with the doctor's name as an index.
  • Step 912 compiles unstructured data identified by indices into individual image reports for roll-up to common schema (e.g., into report groups as described).
  • Step 914 stores images of unstructured data and structured data in a file folder. Each image is uniquely described by various data tags or values from a set of indices. The images for the structured data and unstructured data may be stored in the same folder. In step 916 , the indices used to identify, describe, or tag each of the images are stored in a database.
  • Step 918 applies web links for the ability to view the original format of the data of either structured or unstructured data.
  • the links allow for the traversing from the images back to the robustness of structured data.
  • a user can pull up an image and if the user decides to access the structured data that was used to create the image, a link may be used to retrieve the structured data.
  • views of data in different formats may be generated, organized, and identified in a database through the use of indices.
  • a common schema is applied for further roll-up or classification of the documents after the documents have been converted to a common format.
  • the documents may be organized in a way that allows for searching and sorting of images created from documents of different formats. This also allows documents from different formats to be displayed on a webpage in an integrated way.
  • the technique may convert different formats of data to the most constraining of format as the common format, which may cause structured data to be converted into image data, a user can now search through all documents identified in a category simultaneously instead of searching through different formats of documents separately.
  • any paper-based documents, electronic documents, self-entered documents, or any other documents created can be searched and displayed.
  • routines of particular embodiments including C, C++, Java, assembly language, etc.
  • Different programming techniques can be employed such as procedural or object oriented.
  • the routines can execute on a single processing device or multiple processors. Although the steps, operations, or computations may be presented in a specific order, this order may be changed in different particular embodiments. In some particular embodiments, multiple steps treated as sequential in this specification can be performed at the same time.
  • Particular embodiments may be implemented in a computer-readable storage medium for use by or in connection with the instruction execution system, apparatus, system, or device.
  • Particular embodiments can be implemented in the form of control logic in software or hardware or a combination of both.
  • the control logic when executed by one or more processors, may be operable to perform that which is described in particular embodiments.
  • Particular embodiments may be implemented by using a programmed general purpose digital computer, by using application specific integrated circuits, programmable logic devices, field programmable gate arrays, optical, chemical, biological, quantum or nanoengineered systems, components and mechanisms may be used.
  • the functions of particular embodiments can be achieved by any means as is known in the art.
  • Distributed, networked systems, components, and/or circuits can be used.
  • Communication, or transfer, of data may be wired, wireless, or by any other means.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Epidemiology (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Biomedical Technology (AREA)
  • Pathology (AREA)
  • Computational Linguistics (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

In one embodiment, a method receives a plurality of documents. The documents may be received from different medical providers. Also, the documents may be medical record documents generated or captured in a first format and a second format. The first format may be an unstructured data format and the second format may be a structured data format. The first and second documents are then converted to a common format. For example, a common format may emerge as the most restrictive or constrained denominator of the first format and the second format. A schema is determined that provides an organizational structure with categories that can be used to index the content of the first and second documents while they are being converted to the common format. The schema and indexing enable the different formats of documents to be combined and organized simultaneously into a single view for a comprehensive review.

Description

CROSS REFERENCE TO RELATED APPLICATIONS
The present disclosure is a continuation of U.S. patent application Ser. No. 13/562,191, entitled “Classifying Information Captured in Different Formats for Search and Display in an Image-Based Format,” filed Jul. 30, 2012, issued as U.S. Pat. No. 8,572,021 on Oct. 29, 2013, which is a continuation of U.S. patent application Ser. No. 12/399,894, entitled “Combining Medical Information Captured in Structured and Unstructured Data Formats for Use or Display in a User Application, Interface, or View,” filed Mar. 6, 2009, issued as U.S. Pat. No. 8,250,026 on Aug. 21, 2012, both of which are incorporated by reference in their entirety for all purposes.
BACKGROUND
Particular embodiments generally relate to a document management system.
In document management, information is likely sourced from different data providers that create and store information in their own formats, such as in paper or electronically in a computer system. In the medical field, patient information is migrating toward an increasing use of electronic systems that can store and organize information within a structured database framework as defined by data tables, fields, and values. Mapping or describing information into named fields and defining the relationship of these fields with each other inside a database structure enables the computer system to identify specific data, recognize the difference between one data field from another, and perform analytical tasks such as queries, calculations, and algorithmic functions. A flexible data structure also allows for compilation of information to be organized in different topics and presented in different reports. Structure can be very simple to very complex.
In the medical field, a majority of physicians still create and keep patient information on paper, such as doctor's notes, faxed lab reports, and hand-written prescriptions. To convert paper to an electronic format, the paper document is scanned into an electronic image. In its raw and original state, the difference in content captured by one electronic image cannot be read and is not recognizable by a computer system as being distinctly different from the content of another electronic image. Without the help of descriptive definitions, electronic images are not distinguishable by the computer and are therefore limited in analytical usefulness. For example, a computer cannot differentiate the results of a lab report image from those of another lab report image or from the content of a prescription image for that matter.
These two formats, paper images and machine-readable information stored in a database are usually not compiled together in a manner that would enable a system to search through both formats simultaneously. Rather, a system may store these different formats in separate directories or file folders and may display the information in separate views or in separate web pages of a web-based system. When patient information is presented in paper images or in machine-readable format, but cannot be compiled together in a way that can be organized for searching, sorting, and analysis simultaneously, the usefulness of patient's information is restricted. The result of having disparate information stored in a system that does not allow for simultaneous query and organization presents a missed opportunity in health care for leveraging an available set of more complete information as basis for making decisions and in some cases, may lead to clinical oversights and medical errors.
SUMMARY
Particular embodiments generally relate to the process of combining and organizing information that was originally created in disparate formats. One type of data is created and saved in paper format and includes examples from handwritten notes, text typed on paper, a body of an email message, and word-processing documents.
The content of a paper document is converted to an electronic image. The conversion may use scanning technology or other recognition technology. In its original state, the content of an electronic image cannot be recognized or understood by the computer as being any different from another electronic image. The image data is considered unstructured data.
Another type of data format is a stream of data transmitted in electronic format, where the data values are identified and separated by delimiters and recognized by the computer system as those pertaining to specific data fields (machine-readable data). This format can also pertain to data values already defined and stored in a database.
A third type is self-entered forms that capture data typed or manually selected from a pull-down list of data choices by the user. This latter format is similar to the machine-readable format in that the data fields are already defined in the database. Both of these data formats may be considered structured data.
In one embodiment, a plurality of documents are received from different data providers, such as from healthcare providers, where the medical documents can be captured in paper, electronically transmitted in parsed, delimited format such as data stream from a diagnostic center to a hospital, or self-entered data such as the physician or patient typing information into a computer system. Other examples include audio recordings, video clips, the body of an email message, and word-processing documents.
For the purpose of illustrating the difference between structured and unstructured data, specifically paper-derived, particular embodiments focus between scanned images (from the paper documents) and machine-readable data that is identified or delimited into recognizable database fields. It will be acknowledged that structured and unstructured data may be represented in other forms. These two very different formats are then converted to a common format.
The different types of data formats are converted to a common format, which is then stored in the database with appropriate indices. The image data is tagged (indexed) with various descriptors selected from categories that are the equivalent of data fields, where eventually, these categories can be further rolled up or mapped into broadly defined or higher-level categories such as by topics or sections of a report. The electronically transmitted data, which is already pre-defined by data fields, can be compiled and presented into a rendered report similar to that of a paper document. Together, paper and electronically-transmitted data co-exist in a common format that is identified by similar data labels and fields, and is recognizable and distinguishable by a computer system. Having a common format enables the simultaneous searching of different data formats and the presentation of searched results in a single, organized view. The search is performed with the use of images. For example, a user may choose to search by one or more search categories and the images created from either paper documents or structured data may be returned as results. Moreover, the resulting images from either paper or structured formats may now be displayed together in the same organization schema, folder, or webpage view.
In one embodiment, a common format is defined as the most constraining format between the two formats. In one example, this may be an image format, which is an unstructured format. Thus, the second documents in structured format are converted to unstructured (image) data format.
A schema is a method for organizing a plurality of data categories and super-categories along which unstructured data (document images) and structured data (electronically delimited data) can be classified. The schema includes a list of categories presented in an organized sequence or in order of importance. It provides a directory or a way to organize or group various data classifications that can range from very narrow to broad, for example, by the details of data fields to broadly defined topics, report sections, or report types. The schema enables presentation of both paper images and electronically transmitted data in a single view that is easy to understand, searchable, and selectable or re-classifiable (into a folder or report).
Data classification may be included in the content of both types of data formats, such as author name, creation date, medical organization that provided the document, patient name about whom information is created, diagnosis, or medical specialization for which the content references. These indices or text descriptors offer a way to classify the content on the paper image document and at the same time are applicable to the data fields associated with the electronically transmitted data. Further separation or roll-up of categories for describing or tagging the specific content found in both type of document formats are determined or predetermined by the classification that make sense or that are commonly found in both types of document formats. These sub-categories are then grouped and rolled up to an organizing schema or principle for organizing documents. An image document may be tagged with one or more indices if the paper image contains content that is determined to match one or more categories.
A further understanding of the nature and the advantages of particular embodiments disclosed herein may be realized by reference of the remaining portions of the specification and the attached drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 depicts a simplified system for consolidating and organizing documents according to one embodiment.
FIG. 2 depicts an example of converting unstructured data and structured data into a common format according to one embodiment.
FIG. 3 depicts an example of structured data delimited by vertical lines that represent the separation of different data fields.
FIG. 4 depicts a table that specifies sub-categories for report groups according to one embodiment.
FIG. 5 shows an example of the use of indices for tagging or identifying image data that can be stored as data categories in the database according to one embodiment.
FIG. 6 shows an interface that can be used to index images according to one embodiment.
FIG. 7 shows an interface that can be used to search for medical records according to one embodiment.
FIG. 8A depicts an example of an interface of search results from reviewing different formats of data provided by different sources according to one embodiment.
FIG. 8B shows an example of an image-based report generated using structured data according to one embodiment.
FIG. 8C shows an example of a view of the structured data displayed across time on a graph or chart according to one embodiment.
FIG. 9 depicts a simplified flowchart of a method for indexing documents according to one embodiment.
DETAILED DESCRIPTION OF EMBODIMENTS
FIG. 1 depicts a simplified system 100 for consolidating and organizing documents according to one embodiment. System 100 includes a document manager 102, one or more clients 104, and a database 106. It will be understood that certain elements of system 100 have not been shown, such as networks, other computing devices, etc.
Document manager 102 is configured to receive documents created in different formats. For example, medical records may be received from different medical providers. In one example, a personal health record (PHR) may be managed by a patient. For example, a patient may receive medical care from multiple medical providers who in turn create documents about the episode or portion of care that they had delivered or provided the patient. However, with a PHR system, a patient may consolidate the records in different formats received from a multitude of healthcare providers inside of one system, where the system enables the medical information to be searched, queried, reported, and viewed from a web-based system or from one place. The PHR model provides for comprehensive and portable medical records that are directed by the patient.
Client 104 may include a computing device that is used by a user. The user may want to access different documents in their personal health record. For example, a user may query for documents for a different disease. These documents may have been generated by different medical providers and thus may have been created in different formats. Particular embodiments allow the user to display and search for documents that originated in different formats in a more integrated way inside a single computer system or across a multitude of systems.
The documents may be received in different formats because different medical providers generate the documents in different ways or media. For example, different medical providers may use different systems that create and generate documents in different formats. For example, depending on the type of care and standard of documentation, different documents may be generated in different formats, including physician's notes in paper-based format, x-rays in digital format, medication history transmitted in delimited format, or messages dictated onto word processing documents. Other formats may be contemplated.
A first format is an unstructured data format. Unstructured data is information that cannot be organized into a structure of database tables with descriptive columns and rows of records. For example, in the case of a scanned image of a paper document, the information is not annotated (tagged). In other words, the information is not identifiable or discernible to a computer system. For example, unstructured data may be image-based data, such as a bit map object. Also, textural objects may be unstructured data, such as word processing documents, e-mails, etc. The characteristic of unstructured data is that content displayed on the document cannot easily be read and analyzed by a machine. For example, in its original state, the content of an image of unstructured data cannot be recognized or understood by the computer as being any different from content of another image.
Structured data is in a form where the information can be easily manipulated to generate different reports and can be easily searched. Structured data has an enforced composition to the different types of data in the data structure and this allows for querying and reporting against the data types. The structured data can be manipulated easily to generate different types of documents. In contrast, unstructured data, such as an image, is stored as an image. In its raw form, the information captured by the image, such as the doctor's name, handwritten or typed notes on the document, is not stored or identified as data fields of a database. Thus, the image is not searchable (i.e., a search for documents with the doctor's name would not yield the correct image of the document if the image is not identified in the database).
Particular embodiments take the documents stored in different formats and determine a common format in which to store the documents. For example, a format that is most constraining the formats being used in terms of the ability to name, sort, and parse data by variables may be determined. For example, if the most constraining format is the image format, then both types of documents may be converted to the image format. However, if the most constraining format is structured data, then documents may be stored in the structured data format.
In one embodiment, the common format is an image-based format, which is in unstructured state. Particular embodiments may refer to documents in the common format as electronic images for discussion purposes. However, it will be understood that other formats may be used as the common format even though images is used for discussion purposes.
Data received that is in a structured data format may be converted to an unstructured state. For example, an electronic image of a report may be generated from the structured data. This may be counterintuitive in that most users desire that data is stored in the structured way because of the flexibility and power in manipulating the structured data. However, particular embodiments want to allow a user to search and sort documents that may have originated in different formats. By converting to a common format, the users can simultaneously search, sort, and view documents that may have originated in the different formats even if some advantages of using structured data is lost.
To account for some of the advantages lost by storing structured data in the common format, a schema is used to organize and index document images. By storing the documents in the image-based format and indexing the images using a common schema, the user can search for documents that were originally presented in varied formats. Although two formats, unstructured and structured, are discussed, it will be recognized that different degrees may be contemplated. For example, some documents may have aspects of both unstructured and structured, such as an electronic form that includes data fields and images.
Conventionally, structured data in the form of data fields and unstructured data in the form of scanned images (of paper documents) may have been separately stored in a state where it is not convenient or possible for a user to search through both formats at the same time. Rather, a single search of the structured data may have been performed and then a separate search of the unstructured data may have been performed. Also, when a user wanted to display documents, the documents for structured data and unstructured data were usually not displayed together in a single view, rather in different tabs or in different web pages. However, particular embodiments do allow for the consolidation and display of documents that were originally created in different formats. The schema is used to organize images of both structured and unstructured data in common categories, such that they can be searched and displayed together.
FIG. 2 depicts an example of converting unstructured data (image documents) and structured data into a common format according to one embodiment. As shown, unstructured data 202 and structured data 204 are being processed by a document converter 206. Document converter 206 is configured to convert unstructured documents 202 and structured data 204 into a common format and organized into a common schema.
In one embodiment, structured data 204 may be received as a stream of text data with each field and its respective value separated by a delimiter or vertical pipeline, as depicted in FIG. 3. The streamed data may be organized by multiple different fields with each field values eventually mapped into a data structure.
Unstructured documents 202 may be any type of unstructured data that is received. For example, images of paper documents may be received or paper documents may be received and scanned into images. In this case, document converter 206 converts unstructured documents 202 into images. If unstructured documents 202 are already in image format, then a conversion is not performed. To convert structured data 204 to images, structured data 204 may be retrieved from different fields of the database. An image report is generated from an aggregation of multiple data values retrieved from the database. The report shows an image of data values on a page. As an image in its basic form, its content is no longer recognizable by a computer.
To provide for some structure to the common format, a schema 208 is used to organize and categorize the image data. The schema is an organizational schema or framework that includes categories from which the images may be organized. Schema 208 may be determined based on expected content that may be included in the images. For example, medical records may have specific information that is included in them, such as a doctor's name, address, diagnosis, prescription, or other categories may usually be found in medical documents and these categories are included in schema 208. Accordingly, an effective organizational schema may be determined in which to classify the images.
Schema 208 is applied to indexing or tagging the document by the various categories. For example, some categories include an author's name, author date, type of page (page categories), and any sub-categories that may be determined. One example of indexing is for a category that includes a doctor's name or ID, any image that includes that doctor's name or ID is indexed with that category. For example, an image identifier may be tagged with that category for the doctor's name or ID. The schema is applied to all images and indices 210 are then generated. The indices may be stored with the images in database 106.
After indexing, structured and unstructured data co-exist in a common format that is identified by similar data labels and fields, and is recognizable and distinguishable by a computer system. This allows searching of images for both document formats simultaneously. For example, a user may search for one of the categories and images created from either unstructured or structured data may be returned. Also, images for both formats of documents may now be displayed simultaneously in the same organization schema, folder, or webpage (view).
In one embodiment, the schema may organize the data in report groups, which are higher level categories. Inside the report groups are section headers that are sub-categories. FIG. 4 discloses an interface that specifies report groups according to one embodiment. A table 400 specifies report groups 401 that have been created for the schema. The report groups 401 may be categories where groups of documents received by a patient can be categorized. For example, in a database, the report group Advanced Health Care Directive is shown as one of the report group categories. A code can be used to identify each report group found in the schema similar to a node on a tree diagram. The code may also be used to determine how to display the report groups. For example, a lower code may cause a report group to be displayed before a higher coded report group.
According to one embodiment, there are specifies sub-categories within each report groups. A column 402 shows the name of various sub-categories that are mapped into a report group 401. The sub-categories are determined based on various data that may be received in each report group. For example, different medical providers may provide different documents to a patient (i.e., in a different format). The documents from different medical providers, however, may be categorized into one of the sub-categories. Any of the report groups (401) and sub-categories (402) can be indices 210.
Schema 208 is then used to index images. Using indices 210, unstructured data may be given some structure to allow for searching and displaying of images. Although structured data 204 was already in a format that could be searched and displayed, to integrate unstructured documents 202 and structured data 204, structured data 204 is converted to the image-based format, which is a more constraining format. That is, an image inherently does not have any structured data to it. However, a common schema 208 is applied to index the images from unstructured documents 202 and structured data 204 to allow integrated searching of both.
Indices 210 may be stored in a database as field names. FIG. 5 shows an example of a database table that can be used to store indices 210 according to one embodiment. As shown, an image 500 includes content 502. Content 502-1-502-4 may be a document name, Author, Date and Doctor's notes.
This content may be tagged with indices. For example, the document ID may be stored as a row in a table 510. An identifier 512 may be stored to identify image 500. Indices 514 are provided in the columns of table 510. Table 510 may be populated with content from the image or may be organized by category descriptors. For image 500, the fields of the table are filled with data based on the content of the image. For example, for index 514-1, an image's name is inserted into the corresponding data field. Also, the Document name, Author, Author Date, Document Type, and Source may be inserted into the other corresponding fields for indices 514-1 to 514-5.
Table 510 may also include category descriptors that are used to organize the image. For example, the image may fall into different categories based on the content of the image, where the image originated, what medical condition the image is diagnosing, etc. Table 510 may insert information for categories for image 500, such as the document may be associated with the doctor's notes category and the image is tagged in that category. Other categories may or may not be tagged depending on image 500.
FIG. 6 shows an interface 600 that can be used to index images according to one embodiment. An image 602 is shown that is being indexed. An index section 604 is used to commit and apply attributes or descriptions of the image document 602 in the form of indices. For example, entry boxes 606 are used to receive information that can be used to index image 602. For example, a name 608 and source 610 is used to identify the doctor by name and also the source of where image 602 is received from.
A category 612 is used to categorize image 602. The categories may be used to index image 602 based on the report groups and sub-categories that were described with respect to FIGS. 4A and 4B. When the information is input in index section 604, image 602 is indexed.
Image 602 may be indexed manually or automatically. For example, index section 604 may be used to provide a template for automatically indexing other images. For example, once an image is indexed using interface 600, then other images can be automatically indexed using the template. In one example, similar documents, such as images from the same doctor may be automatically indexed.
After indexing, the images may be searched and displayed. FIG. 7 shows an interface 700 that can be used to search for medical records according to one embodiment. As shown, different categories 702 may be used to search for documents. The categories immunizations, medications and allergies, behavioral health, cardiac electrophysiology, cardiac electroscopy, and cardiology have been selected. For this search, all images that have been indexed with these categories may be retrieved from database 106. By using interface 700, searches may be performed over images that originated from unstructured documents 202 and structured data 204. Separate searches do not need to be performed for the two types of documents.
The schema may be organized by different report groups. For example, different categories of the schema are included in a report group. That is, for a report group Medication and Allergies, the data may be further tagged by specific medical specialties reflecting different diseases categories 704, such as Allergy and Immunology, Anesthesiology, Audiology, Behavioral Health, etc. Any documents tagged with these sub-categories may be searched for and retrieved if the report group hospitalization is used. The organizational schema thus provides some structure with how the images are organized.
FIG. 8A depicts an example of an interface 800 including search results according to one embodiment. As shown in panel 802, links to different images are provided. The images may be images that were generated from documents of different formats. For example, physician notes 804 may be images of paper-based notes. Also, an image 806 may be an image of structured data relating to a record of a hospitalization. Also, an image 808 is an image of notes for the hospitalization. Thus, a user can see different images for different documents under the same report group. For example, all physician notes are categorized together and all images for hospital and surgery are categorized together. Conventionally, electronic hospitalization notes would have been displayed in a different category than paper-based hospitalization notes. By converting the documents to a common format and then indexing them, such as indexing the images with the category “Hospitalization” a search for hospitalization brings up images for documents that originated in different formats.
A preview panel 810 shows images of documents. For example, physician notes 804 are shown, which are mostly composed of handwritten notes and paper-based images. A user may select the different physician notes and have them be displayed. Although not shown, images of documents that originated in different formats may also be included. For example, in preview panel 812 or 804, a document originally populated by structured data as its content is displayed as an image report along side an image of another document that originally was created on paper. In the medication and allergies report group, there are different types of documents as depicted by different icons as shown in 804.
Interface 800 can be used to view structured and unstructured data and access the benefits of structured data stored in its defined way for greater data manipulation. For example, a link 814 (e.g., the link to View Trend Data) is included in interface 800 to allow a user to access the structured data behind the image. Once link 814 is selected, structured data is retrieved and can be displayed in a timeline or graphical way. For example, a report image may be rendered or generated from the structured data that corresponds to one of the images. FIG. 8B shows an example of an image of a report generated using structured data according to one embodiment. As shown, a test panel is shown. This image is unstructured data in that content found in the image cannot be distinguished by a computer system. However, the schema was used to index the image and it has been retrieved in response to the query received from interface 700 of FIG. 7. When link 814 is selected, a view of the structured data is displayed across time on a graph or chart in FIG. 8C as compared with the rendered snapshot of the data provided by the report image. In this case, structured data that is associated with the image of FIG. 8B is retrieved. The structured data is then used to generate a report as shown. The report may be different from the image if different analytics are desired. However, the report may show the same information as the image; however, it is not in an image format. This may allow further manipulation of the report, such as keyword searching, editing, etc.
FIG. 9 depicts a simplified flowchart of a method for indexing different data formats according to one embodiment. Step 902 determines a common format. Depending on the format of the unstructured data to be indexed (for example, paper image documents), a different medium or common format may be selected that would be more appropriate for tagging and organizing the content. In one example, a common format is derived from reviewing the most constrained of formats. In one embodiment, the different formats of structured and unstructured data are reviewed and where the most constraining format is selected to become the common format for both types of data. For example, if the only documents to be indexed are structured data, then the common format may be the structured data format. However, if images are to be indexed, then the most constraining common-denominator format is the image-based format. In one embodiment, the formats of documents may be analyzed and the common format is determined automatically.
Step 904 determines the indices to be used for tagging the structured and unstructured-based images. Indices are chosen after reviewing the organizing principle to which a common set of descriptors can be identified to tag and organize the images such that they can all be searched and sorted together. For example, the structured data is tagged by indices and roll-up to image reports along which an organizing schema emerges that can apply to both structured and unstructured-based images. The structured data is parsed into images that are most relevant to the categories of the organizing schema. For example, if a doctor's name is included in the structured data and used to create the image, the image may be indexed with a tag for the doctor's name.
Step 906 compiles or separates structured data into individual image reports that can be described by the indices. The way that structured data is parsed and compiled into individual images is determined by both the nature of the content and the roll-up categories of the common schema from which to apply the organization across all resulting images. For example, the content is analyzed and a report that is considered to represent the data in the most useful manner is determined based on different factors, such as user preferences, conversion rules, etc. Also, the content of the image may be determined based on different categories that could be applied.
Step 908 reviews unstructured data for tagging and indexing. Step 910 uses the schema 208 to index the unstructured data with the same indices as those for images generated from the structured data. The challenge is to apply the right tags or indices for describing the content of an image that is not recognizable or identifiable by a computer system. Particular embodiments provide certain techniques that may be used to index the images. For example, optical character recognition may be performed on the image to determine information from the content of the image. Also, an operator or user may review the image and enter the information. Other methods of extracting information from the unstructured data may be performed. When the information is extracted, it may be matched with categories in schema 208. For example, if a doctor's name is recognized in an image, the image may be tagged with the doctor's name as an index.
Step 912 compiles unstructured data identified by indices into individual image reports for roll-up to common schema (e.g., into report groups as described).
Step 914 stores images of unstructured data and structured data in a file folder. Each image is uniquely described by various data tags or values from a set of indices. The images for the structured data and unstructured data may be stored in the same folder. In step 916, the indices used to identify, describe, or tag each of the images are stored in a database.
Step 918 applies web links for the ability to view the original format of the data of either structured or unstructured data. For example, the links allow for the traversing from the images back to the robustness of structured data. A user can pull up an image and if the user decides to access the structured data that was used to create the image, a link may be used to retrieve the structured data.
Accordingly, views of data in different formats may be generated, organized, and identified in a database through the use of indices. A common schema is applied for further roll-up or classification of the documents after the documents have been converted to a common format. Also, by using a common schema, the documents may be organized in a way that allows for searching and sorting of images created from documents of different formats. This also allows documents from different formats to be displayed on a webpage in an integrated way. Although the technique may convert different formats of data to the most constraining of format as the common format, which may cause structured data to be converted into image data, a user can now search through all documents identified in a category simultaneously instead of searching through different formats of documents separately. Thus, if a user wants to see all documents referencing a hospitalization, any paper-based documents, electronic documents, self-entered documents, or any other documents created can be searched and displayed.
Although the description has been described with respect to particular embodiments thereof, these particular embodiments are merely illustrative, and not restrictive. Although medical records are discussed, other documents may be used.
Any suitable programming language can be used to implement the routines of particular embodiments including C, C++, Java, assembly language, etc. Different programming techniques can be employed such as procedural or object oriented. The routines can execute on a single processing device or multiple processors. Although the steps, operations, or computations may be presented in a specific order, this order may be changed in different particular embodiments. In some particular embodiments, multiple steps treated as sequential in this specification can be performed at the same time.
Particular embodiments may be implemented in a computer-readable storage medium for use by or in connection with the instruction execution system, apparatus, system, or device. Particular embodiments can be implemented in the form of control logic in software or hardware or a combination of both. The control logic, when executed by one or more processors, may be operable to perform that which is described in particular embodiments.
Particular embodiments may be implemented by using a programmed general purpose digital computer, by using application specific integrated circuits, programmable logic devices, field programmable gate arrays, optical, chemical, biological, quantum or nanoengineered systems, components and mechanisms may be used. In general, the functions of particular embodiments can be achieved by any means as is known in the art. Distributed, networked systems, components, and/or circuits can be used. Communication, or transfer, of data may be wired, wireless, or by any other means.
It will also be appreciated that one or more of the elements depicted in the drawings/figures can also be implemented in a more separated or integrated manner, or even removed or rendered as inoperable in certain cases, as is useful in accordance with a particular application. It is also within the spirit and scope to implement a program or code that can be stored in a machine-readable medium to permit a computer to perform any of the methods described above.
As used in the description herein and throughout the claims that follow, “a”, “an”, and “the” includes plural references unless the context clearly dictates otherwise. Also, as used in the description herein and throughout the claims that follow, the meaning of “in” includes “in” and “on” unless the context clearly dictates otherwise.
Thus, while particular embodiments have been described herein, latitudes of modification, various changes, and substitutions are intended in the foregoing disclosures, and it will be appreciated that in some instances some features of particular embodiments will be employed without a corresponding use of other features without departing from the scope and spirit as set forth. Therefore, many modifications may be made to adapt a particular situation or material to the essential scope and spirit.

Claims (16)

We claim:
1. A method comprising:
receiving, by a computing device, a plurality of medical documents, the plurality of medical documents including a first set of medical documents in a first format in an image based format and a second set of medical documents in a second format in a structured data format;
determining, by the computing device, a schema based on a review of an organizing principle to which a common set of data fields can be used to classify information from the first set of medical documents and the second set of medical documents;
determining, by the computing device, via extraction, unstructured information in the first set of medical documents relevant to the common set of data fields, wherein the unstructured information is based on information included in images of the first set of medical documents and in a format that is insertable into a table;
classifying, by the computing device, structured information from a first data field in the second set of medical documents and a portion of the unstructured information in the first set of medical documents to a common data field in the common set of data fields, wherein the portion of unstructured information is included in an image of a medical document in the first set of medical documents;
inserting, by the computing device, the structured information and the unstructured information into the table based on the common set of data fields, wherein the common data field in the table includes the structured information from the first data field and the portion of unstructured information;
storing, by the computing device, the table based on the common set of data fields to allow for searching for information in both the first set of medical documents and the second set of medical documents using a query, wherein the query retrieves the structured information from the first data field and the portion of unstructured information from the common data field in the table;
receiving the query;
determining search results for the query using the set of common data fields associated with the schema, wherein the search results include information in the table from a subset of medical documents from the first set of medical documents and the second set of medical documents that are determined to match the query; and
displaying the search results in an interface, wherein the information from the subset of medical documents is displayed in a common data format.
2. The method of claim 1, further comprising:
tagging information in the first set of medical documents according to a first set of data fields;
tagging information in the second set of medical documents according to a second set of data fields; and
inserting the tagged information into the common set of data fields in the table.
3. The method of claim 2, wherein a second data field in the first set of data fields and the first data field in the second set of data fields correspond to the common data field in the set of common data fields.
4. The method of claim 1, wherein:
the common data format is a structured data format.
5. The method of claim 1, wherein:
the common data format is the image based format.
6. The method of claim 1, further comprising:
linking the second set of medical documents with the table; and
allowing a user to retrieve structured data for the second set of medical documents from the table.
7. The method of claim 1, wherein the plurality of medical documents is received from different medical providers.
8. The method of claim 1, wherein determining the unstructured information in the first set of medical documents relevant to the common set of data fields comprises:
determining a template for automatically determining a subset of the unstructured information from images from the first set of medical documents; and
automatically determining the subset of the unstructured information for images based on the template.
9. A non-transitory computer-readable storage medium containing instructions, that when executed, control a computer system to be configured for:
receiving a plurality of medical documents, the plurality of medical documents including a first set of medical documents in a first format in an image based format and a second set of medical documents in a second format in a structured data format;
determining a schema based on a review of an organizing principle to which a common set of data fields can be used to classify information from the first set of medical documents and the second set of medical documents;
determining, via extraction, unstructured information in the first set of medical documents relevant to the common set of data fields, wherein the unstructured information is based on information included in images of the first set of medical documents and in a format that is insertable into a table;
classifying structured information from a first data field in the second set of medical documents and a portion of the unstructured information in the first set of medical documents to a common data field in the common set of data fields, wherein the portion of unstructured information is included in an image of a medical document in the first set of medical documents;
inserting the structured information and the unstructured information into the table based on the common set of data fields, wherein the common data field in the table includes the structured information from the first data field and the portion of unstructured information;
storing the table based on the common set of data fields to allow for searching for information in both the first set of medical documents and the second set of medical documents using a query, wherein the query retrieves the structured information from the first data field and the portion of unstructured information from the common data field in the table;
receiving the query;
determining search results for the query using the set of common data fields associated with the schema, wherein the search results include information in the table from a subset of medical documents from the first set of medical documents and the second set of medical documents that are determined to match the query; and
displaying the search results in an interface, wherein the information from the subset of medical documents is displayed in a common data format.
10. The non-transitory computer-readable storage medium of claim 9, further configured for:
tagging information in the first set of medical documents according to a first set of data fields;
tagging information in the second set of medical documents according to a second set of data fields; and
compiling the tagged information into the common set of data fields.
11. The non-transitory computer-readable storage medium of claim 10, wherein a second data field in the first set of data fields and the first data field in the second set of data fields correspond to the common data field in the set of common data fields.
12. The non-transitory computer-readable storage medium of claim 9, wherein:
the common data format is a structured data format.
13. The non-transitory computer-readable storage medium of claim 9, wherein:
the common data format is the image based format.
14. The non-transitory computer-readable storage medium of claim 9, further configured for:
linking the second set of medical documents with the table; and
allowing a user to retrieve structured data for the second set of medical documents from the table.
15. The non-transitory computer-readable storage medium of claim 9, wherein the plurality of medical documents is received from different medical providers.
16. An apparatus comprising:
one or more computer processors; and
a non-transitory computer-readable storage medium comprising instructions, that when executed, control the one or more computer processors to be configured for:
receiving a plurality of medical documents, the plurality of medical documents including a first set of medical documents in a first format in an image based format and a second set of medical documents in a second format in a structured data format;
determining a schema based on a review of an organizing principle to which a common set of data fields can be used to classify information from the first set of medical documents and the second set of medical documents;
determining, via extraction, unstructured information in the first set of medical documents relevant to the common set of data fields, wherein the unstructured information is based on information included in images of the first set of medical documents and in a format that is insertable into a table;
classifying structured information from a first data field in the second set of medical documents and a portion of the unstructured information in the first set of medical documents to a common data field in the common set of data fields, wherein the portion of unstructured information is included in an image of a medical document in the first set of medical documents;
inserting the structured information and the unstructured information into the table based on the common set of data fields, wherein the common data field in the table includes the structured information from the first data field and the portion of unstructured information;
storing the table based on the common set of data fields to allow for searching for information in both the first set of medical documents and the second set of medical documents using a query, wherein the query retrieves the structured information from the first data field and the portion of unstructured information from the common data field in the table;
receiving the query;
determining search results for the query using the set of common data fields associated with the schema, wherein the search results include information in the table from a subset of medical documents from the first set of medical documents and the second set of medical documents that are determined to match the query; and
displaying the search results in an interface, wherein the information from the subset of medical documents is displayed in a common data format.
US14/054,316 2009-03-06 2013-10-15 Classifying information captured in different formats for search and display Expired - Fee Related US9165045B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/054,316 US9165045B2 (en) 2009-03-06 2013-10-15 Classifying information captured in different formats for search and display

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US12/399,894 US8250026B2 (en) 2009-03-06 2009-03-06 Combining medical information captured in structured and unstructured data formats for use or display in a user application, interface, or view
US13/562,191 US8572021B2 (en) 2009-03-06 2012-07-30 Classifying information captured in different formats for search and display in an image-based format
US14/054,316 US9165045B2 (en) 2009-03-06 2013-10-15 Classifying information captured in different formats for search and display

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US13/562,191 Continuation US8572021B2 (en) 2009-03-06 2012-07-30 Classifying information captured in different formats for search and display in an image-based format

Publications (2)

Publication Number Publication Date
US20140046931A1 US20140046931A1 (en) 2014-02-13
US9165045B2 true US9165045B2 (en) 2015-10-20

Family

ID=42679128

Family Applications (3)

Application Number Title Priority Date Filing Date
US12/399,894 Expired - Fee Related US8250026B2 (en) 2009-03-06 2009-03-06 Combining medical information captured in structured and unstructured data formats for use or display in a user application, interface, or view
US13/562,191 Active US8572021B2 (en) 2009-03-06 2012-07-30 Classifying information captured in different formats for search and display in an image-based format
US14/054,316 Expired - Fee Related US9165045B2 (en) 2009-03-06 2013-10-15 Classifying information captured in different formats for search and display

Family Applications Before (2)

Application Number Title Priority Date Filing Date
US12/399,894 Expired - Fee Related US8250026B2 (en) 2009-03-06 2009-03-06 Combining medical information captured in structured and unstructured data formats for use or display in a user application, interface, or view
US13/562,191 Active US8572021B2 (en) 2009-03-06 2012-07-30 Classifying information captured in different formats for search and display in an image-based format

Country Status (1)

Country Link
US (3) US8250026B2 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150205846A1 (en) * 2014-01-21 2015-07-23 PokitDok, Inc. System and method for dynamic document matching and merging
WO2017106773A1 (en) * 2015-12-19 2017-06-22 Von Drakk Viktor Method and device for correlating multiple tables in a database environment
US10007757B2 (en) 2014-09-17 2018-06-26 PokitDok, Inc. System and method for dynamic schedule aggregation
US10013292B2 (en) 2015-10-15 2018-07-03 PokitDok, Inc. System and method for dynamic metadata persistence and correlation on API transactions
US10102340B2 (en) 2016-06-06 2018-10-16 PokitDok, Inc. System and method for dynamic healthcare insurance claims decision support
US10108954B2 (en) 2016-06-24 2018-10-23 PokitDok, Inc. System and method for cryptographically verified data driven contracts
US10366204B2 (en) 2015-08-03 2019-07-30 Change Healthcare Holdings, Llc System and method for decentralized autonomous healthcare economy platform
US10417379B2 (en) 2015-01-20 2019-09-17 Change Healthcare Holdings, Llc Health lending system and method using probabilistic graph models
US10474792B2 (en) 2015-05-18 2019-11-12 Change Healthcare Holdings, Llc Dynamic topological system and method for efficient claims processing
US10805072B2 (en) 2017-06-12 2020-10-13 Change Healthcare Holdings, Llc System and method for autonomous dynamic person management
US10922299B2 (en) 2018-04-24 2021-02-16 The Von Drakk Corporation Correlating multiple tables in a non-relational database environment
US11126627B2 (en) 2014-01-14 2021-09-21 Change Healthcare Holdings, Llc System and method for dynamic transactional data streaming

Families Citing this family (54)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090049104A1 (en) * 2005-06-08 2009-02-19 William Pan Method and system for configuring a variety of medical information
US20090204885A1 (en) * 2008-02-13 2009-08-13 Ellsworth Thomas N Automated management and publication of electronic content from mobile nodes
US8250026B2 (en) 2009-03-06 2012-08-21 Peoplechart Corporation Combining medical information captured in structured and unstructured data formats for use or display in a user application, interface, or view
US8583453B2 (en) 2009-10-20 2013-11-12 Universal Research Solutions, Llc Generation and data management of a medical study using instruments in an integrated media and medical system
US9020944B2 (en) * 2009-10-29 2015-04-28 International Business Machines Corporation Systems and methods for organizing documented processes
US20110270843A1 (en) * 2009-11-06 2011-11-03 Mayo Foundation For Medical Education And Research Specialized search engines
US9852384B2 (en) * 2010-02-23 2017-12-26 Microsoft Technology Licensing, Llc Web-based visual representation of a structured data solution
AU2011295936B2 (en) * 2010-09-01 2015-11-26 Google Llc Methods and apparatus to cluster user data
US9032513B2 (en) * 2010-09-01 2015-05-12 Apixio, Inc. Systems and methods for event stream platforms which enable applications
US11544652B2 (en) 2010-09-01 2023-01-03 Apixio, Inc. Systems and methods for enhancing workflow efficiency in a healthcare management system
US11481411B2 (en) 2010-09-01 2022-10-25 Apixio, Inc. Systems and methods for automated generation classifiers
US11694239B2 (en) 2010-09-01 2023-07-04 Apixio, Inc. Method of optimizing patient-related outcomes
US11610653B2 (en) 2010-09-01 2023-03-21 Apixio, Inc. Systems and methods for improved optical character recognition of health records
US20130262144A1 (en) 2010-09-01 2013-10-03 Imran N. Chaudhri Systems and Methods for Patient Retention in Network Through Referral Analytics
US11195213B2 (en) 2010-09-01 2021-12-07 Apixio, Inc. Method of optimizing patient-related outcomes
US20160358278A1 (en) 2010-09-29 2016-12-08 Certify Data Systems, Inc. Electronic medical record exchange system
US8396894B2 (en) * 2010-11-05 2013-03-12 Apple Inc. Integrated repository of structured and unstructured data
KR101269043B1 (en) * 2011-04-15 2013-05-29 (주)메디엔비즈 System and method for real-time providing of medical information, and the recording media storing the program performing the said method
US9251295B2 (en) * 2011-08-31 2016-02-02 International Business Machines Corporation Data filtering using filter icons
US8965933B2 (en) * 2012-01-06 2015-02-24 Apple Inc. Multi-tiered caches in data rendering
CN103390005B (en) * 2012-05-11 2016-05-04 北大方正集团有限公司 A kind of method and system of merge document
US20140006369A1 (en) * 2012-06-28 2014-01-02 Sean Blanchflower Processing structured and unstructured data
US20140059051A1 (en) * 2012-08-22 2014-02-27 Mark William Graves, Jr. Apparatus and system for an integrated research library
US9165406B1 (en) 2012-09-21 2015-10-20 A9.Com, Inc. Providing overlays based on text in a live camera view
WO2014049470A1 (en) * 2012-09-25 2014-04-03 Koninklijke Philips N.V. System and method for processing variant call data
US9507750B2 (en) 2012-10-12 2016-11-29 A9.Com, Inc. Dynamic search partitioning
US9047326B2 (en) * 2012-10-12 2015-06-02 A9.Com, Inc. Index configuration for searchable data in network
AU2013328901B2 (en) * 2012-10-12 2016-07-28 A9.Com, Inc. Index configuration for searchable data in network
CN103838763A (en) * 2012-11-26 2014-06-04 鸿富锦精密工业(深圳)有限公司 Object file generation system and method
US9053085B2 (en) * 2012-12-10 2015-06-09 International Business Machines Corporation Electronic document source ingestion for natural language processing systems
US20170185715A9 (en) * 2013-03-15 2017-06-29 Douglas K. Smith Federated Collaborative Medical Records System Utilizing Cloud Computing Network and Methods
CN105144205B (en) * 2013-04-29 2018-05-08 西门子公司 The apparatus and method of natural language problem are answered using multiple selected knowledge bases
WO2014205254A2 (en) * 2013-06-21 2014-12-24 Virtual Radiologic Corporation Radiology data processing and standardization techniques
EP3039639A4 (en) * 2013-08-30 2017-01-25 3M Innovative Properties Company Method of classifying medical documents
US20150149209A1 (en) * 2013-11-27 2015-05-28 General Electric Company Remote/local reference sharing and resolution
US9588971B2 (en) * 2014-02-03 2017-03-07 Bluebeam Software, Inc. Generating unique document page identifiers from content within a selected page region
US10394882B2 (en) * 2014-02-19 2019-08-27 International Business Machines Corporation Multi-image input and sequenced output based image search
US20150278463A1 (en) * 2014-04-01 2015-10-01 Merge Healthcare Incorporated Systems and methods for pre-authorizing image studies
US20160070860A1 (en) * 2014-09-08 2016-03-10 WebMD Health Corporation Structuring multi-sourced medical information into a collaborative health record
US9613072B2 (en) * 2014-10-29 2017-04-04 Bank Of America Corporation Cross platform data validation utility
US10642876B1 (en) * 2014-12-01 2020-05-05 jSonar Inc. Query processing pipeline for semi-structured and unstructured data
US10275476B2 (en) * 2014-12-22 2019-04-30 Verizon Patent And Licensing Inc. Machine to machine data aggregator
US10733370B2 (en) * 2015-08-18 2020-08-04 Change Healthcare Holdings, Llc Method, apparatus, and computer program product for generating a preview of an electronic document
US10838919B2 (en) * 2015-10-30 2020-11-17 Acxiom Llc Automated interpretation for the layout of structured multi-field files
CN105512265A (en) * 2015-12-04 2016-04-20 浪潮通用软件有限公司 Method and device for displaying data through figure
US20180025113A1 (en) * 2016-07-25 2018-01-25 Salesforce.Com, Inc. Event detail processing at run-time
CN107783950B (en) * 2017-04-11 2021-05-14 平安医疗健康管理股份有限公司 Method and device for processing drug instruction
EP3864522A4 (en) * 2018-10-09 2022-06-29 Idiscovery Solutions, Inc. System and method of data transformation
US11188712B2 (en) * 2019-02-28 2021-11-30 Jpmorgan Chase Bank, N.A. Systems and methods for wholesale client onboarding
US11645344B2 (en) 2019-08-26 2023-05-09 Experian Health, Inc. Entity mapping based on incongruent entity data
CN112732946B (en) * 2019-10-12 2023-04-18 四川医枢科技有限责任公司 Modular data analysis and database establishment method for medical literature
US11410447B2 (en) 2020-06-19 2022-08-09 Bank Of America Corporation Information security assessment translation engine
WO2022046049A1 (en) * 2020-08-26 2022-03-03 University Of Pittsburgh - Of The Commonwealth System Of Higher Education Search engine for concatenating and searching combinations of data files
US20220067105A1 (en) * 2020-08-26 2022-03-03 University Of Pittsburgh - Of The Commonwealth System Of Higher Education Search engine for concatenating and searching combinations of data files

Citations (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5664109A (en) 1995-06-07 1997-09-02 E-Systems, Inc. Method for extracting pre-defined data items from medical service records generated by health care providers
US5729730A (en) 1995-03-28 1998-03-17 Dex Information Systems, Inc. Method and apparatus for improved information storage and retrieval system
US6192165B1 (en) 1997-12-30 2001-02-20 Imagetag, Inc. Apparatus and method for digital filing
US6240407B1 (en) 1998-04-29 2001-05-29 International Business Machines Corp. Method and apparatus for creating an index in a database system
US6263330B1 (en) 1998-02-24 2001-07-17 Luc Bessette Method and apparatus for the management of data files
US6338056B1 (en) 1998-12-14 2002-01-08 International Business Machines Corporation Relational database extender that supports user-defined index types and user-defined search
US20020004727A1 (en) 2000-07-03 2002-01-10 Knaus William A. Broadband computer-based networked systems for control and management of medical records
US20020035578A1 (en) 1999-08-27 2002-03-21 Stratigos William N. System and method for integrating paper-based business documents with computer-readable data entered via a computer network
US20020128998A1 (en) * 2001-03-07 2002-09-12 David Kil Automatic data explorer that determines relationships among original and derived fields
US20020138476A1 (en) 2001-03-22 2002-09-26 Fujitsu Limited Document managing apparatus
US20020152245A1 (en) 2001-04-05 2002-10-17 Mccaskey Jeffrey Web publication of newspaper content
US20030033275A1 (en) 2001-08-13 2003-02-13 Alpha Shamim A. Combined database index of unstructured and structured columns
US20030037302A1 (en) 2001-06-24 2003-02-20 Aliaksei Dzienis Systems and methods for automatically converting document file formats
US20030120458A1 (en) * 2001-11-02 2003-06-26 Rao R. Bharat Patient data mining
US20030140044A1 (en) 2002-01-18 2003-07-24 Peoplechart Patient directed system and method for managing medical information
US6674924B2 (en) 1997-12-30 2004-01-06 Steven F. Wright Apparatus and method for dynamically routing documents using dynamic control documents and data streams
US20040044659A1 (en) 2002-05-14 2004-03-04 Douglass Russell Judd Apparatus and method for searching and retrieving structured, semi-structured and unstructured content
US20040049478A1 (en) 2002-09-11 2004-03-11 Intelligent Results Attribute scoring for unstructured content
US20040098664A1 (en) 2002-11-04 2004-05-20 Adelman Derek A. Document processing based on a digital document image input with a confirmatory receipt output
US20040202386A1 (en) 2003-04-11 2004-10-14 Pitney Bowes Incorporated Automatic paper to digital converter and indexer
US6886136B1 (en) 2000-05-05 2005-04-26 International Business Machines Corporation Automatic template and field definition in form processing
US6934698B2 (en) 2000-12-20 2005-08-23 Heart Imaging Technologies Llc Medical image management system
US20060053133A1 (en) * 2004-09-09 2006-03-09 Microsoft Corporation System and method for parsing unstructured data into structured data
US7016963B1 (en) 2001-06-29 2006-03-21 Glow Designs, Llc Content management and transformation system for digital content
US20060294101A1 (en) * 2005-06-24 2006-12-28 Content Analyst Company, Llc Multi-strategy document classification system and method
US20070009158A1 (en) 2005-07-06 2007-01-11 International Business Machines Corporation Paper and electronic recognizable forms
US20070011134A1 (en) 2005-07-05 2007-01-11 Justin Langseth System and method of making unstructured data available to structured data analysis tools
US20070011175A1 (en) 2005-07-05 2007-01-11 Justin Langseth Schema and ETL tools for structured and unstructured data
US20070033229A1 (en) * 2005-08-03 2007-02-08 Ethan Fassett System and method for indexing structured and unstructured audio content
US20070162308A1 (en) 2006-01-11 2007-07-12 Peters James D System and methods for performing distributed transactions
US20070168382A1 (en) 2006-01-03 2007-07-19 Michael Tillberg Document analysis system for integration of paper records into a searchable electronic database
US20070244859A1 (en) * 2006-04-13 2007-10-18 American Chemical Society Method and system for displaying relationship between structured data and unstructured data
US20070282769A1 (en) * 2006-05-10 2007-12-06 Inquira, Inc. Guided navigation system
US20080133455A1 (en) 2006-11-30 2008-06-05 International Business Machines Corporation Method of processing data
US7386462B2 (en) 2001-03-16 2008-06-10 Ge Medical Systems Global Technology Company, Llc Integration of radiology information into an application service provider DICOM image archive and/or web based viewer
US20080154927A1 (en) 2006-12-21 2008-06-26 International Business Machines Corporation Use of federation services and transformation services to perform extract, transform, and load (etl) of unstructured information and associated metadata
US20090164416A1 (en) 2007-12-10 2009-06-25 Aumni Data Inc. Adaptive data classification for data mining
US20090228428A1 (en) * 2008-03-07 2009-09-10 International Business Machines Corporation Solution for augmenting a master data model with relevant data elements extracted from unstructured data sources
US20090248619A1 (en) * 2008-03-31 2009-10-01 International Business Machines Corporation Supporting unified querying over autonomous unstructured and structured databases
US7627588B1 (en) 2001-05-07 2009-12-01 Ixreveal, Inc. System and method for concept based analysis of unstructured data
US7689544B2 (en) 2003-07-23 2010-03-30 Siemens Aktiengesellschaft Automatic indexing of digital image archives for content-based, context-sensitive searching
US7707169B2 (en) 2004-06-10 2010-04-27 Siemens Corporation Specification-based automation methods for medical content extraction, data aggregation and enrichment
US20100114899A1 (en) * 2008-10-07 2010-05-06 Aloke Guha Method and system for business intelligence analytics on unstructured data
US7895219B2 (en) 2005-05-23 2011-02-22 International Business Machines Corporation System and method for guided and assisted structuring of unstructured information
US7949629B2 (en) 2006-10-30 2011-05-24 Noblis, Inc. Method and system for personal information extraction and modeling with fully generalized extraction contexts
US8060376B2 (en) 2004-10-01 2011-11-15 Nomoreclipboard, Llc System and method for collection of community health and administrative data
US8250026B2 (en) 2009-03-06 2012-08-21 Peoplechart Corporation Combining medical information captured in structured and unstructured data formats for use or display in a user application, interface, or view
US8290951B1 (en) * 2008-07-10 2012-10-16 Bank Of America Corporation Unstructured data integration with a data warehouse
US8595245B2 (en) * 2006-07-26 2013-11-26 Xerox Corporation Reference resolution for text enrichment and normalization in mining mixed data

Patent Citations (56)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5729730A (en) 1995-03-28 1998-03-17 Dex Information Systems, Inc. Method and apparatus for improved information storage and retrieval system
US6163775A (en) 1995-03-28 2000-12-19 Enfish, Inc. Method and apparatus configured according to a logical table having cell and attributes containing address segments
US5664109A (en) 1995-06-07 1997-09-02 E-Systems, Inc. Method for extracting pre-defined data items from medical service records generated by health care providers
US6192165B1 (en) 1997-12-30 2001-02-20 Imagetag, Inc. Apparatus and method for digital filing
US6674924B2 (en) 1997-12-30 2004-01-06 Steven F. Wright Apparatus and method for dynamically routing documents using dynamic control documents and data streams
US6263330B1 (en) 1998-02-24 2001-07-17 Luc Bessette Method and apparatus for the management of data files
US6240407B1 (en) 1998-04-29 2001-05-29 International Business Machines Corp. Method and apparatus for creating an index in a database system
US6338056B1 (en) 1998-12-14 2002-01-08 International Business Machines Corporation Relational database extender that supports user-defined index types and user-defined search
US20020035578A1 (en) 1999-08-27 2002-03-21 Stratigos William N. System and method for integrating paper-based business documents with computer-readable data entered via a computer network
US6886136B1 (en) 2000-05-05 2005-04-26 International Business Machines Corporation Automatic template and field definition in form processing
US20020004727A1 (en) 2000-07-03 2002-01-10 Knaus William A. Broadband computer-based networked systems for control and management of medical records
US6934698B2 (en) 2000-12-20 2005-08-23 Heart Imaging Technologies Llc Medical image management system
US20020128998A1 (en) * 2001-03-07 2002-09-12 David Kil Automatic data explorer that determines relationships among original and derived fields
US7386462B2 (en) 2001-03-16 2008-06-10 Ge Medical Systems Global Technology Company, Llc Integration of radiology information into an application service provider DICOM image archive and/or web based viewer
US20020138476A1 (en) 2001-03-22 2002-09-26 Fujitsu Limited Document managing apparatus
US20020152245A1 (en) 2001-04-05 2002-10-17 Mccaskey Jeffrey Web publication of newspaper content
US7627588B1 (en) 2001-05-07 2009-12-01 Ixreveal, Inc. System and method for concept based analysis of unstructured data
US20030037302A1 (en) 2001-06-24 2003-02-20 Aliaksei Dzienis Systems and methods for automatically converting document file formats
US7016963B1 (en) 2001-06-29 2006-03-21 Glow Designs, Llc Content management and transformation system for digital content
US20030033275A1 (en) 2001-08-13 2003-02-13 Alpha Shamim A. Combined database index of unstructured and structured columns
US6980976B2 (en) 2001-08-13 2005-12-27 Oracle International Corp. Combined database index of unstructured and structured columns
US20090259487A1 (en) 2001-11-02 2009-10-15 Siemens Medical Solutions Usa, Inc. Patient Data Mining
US20030120458A1 (en) * 2001-11-02 2003-06-26 Rao R. Bharat Patient data mining
US20030140044A1 (en) 2002-01-18 2003-07-24 Peoplechart Patient directed system and method for managing medical information
US20040044659A1 (en) 2002-05-14 2004-03-04 Douglass Russell Judd Apparatus and method for searching and retrieving structured, semi-structured and unstructured content
US20040049478A1 (en) 2002-09-11 2004-03-11 Intelligent Results Attribute scoring for unstructured content
US20040098664A1 (en) 2002-11-04 2004-05-20 Adelman Derek A. Document processing based on a digital document image input with a confirmatory receipt output
US20040202386A1 (en) 2003-04-11 2004-10-14 Pitney Bowes Incorporated Automatic paper to digital converter and indexer
US7689544B2 (en) 2003-07-23 2010-03-30 Siemens Aktiengesellschaft Automatic indexing of digital image archives for content-based, context-sensitive searching
US7707169B2 (en) 2004-06-10 2010-04-27 Siemens Corporation Specification-based automation methods for medical content extraction, data aggregation and enrichment
US20060053133A1 (en) * 2004-09-09 2006-03-09 Microsoft Corporation System and method for parsing unstructured data into structured data
US8060376B2 (en) 2004-10-01 2011-11-15 Nomoreclipboard, Llc System and method for collection of community health and administrative data
US7895219B2 (en) 2005-05-23 2011-02-22 International Business Machines Corporation System and method for guided and assisted structuring of unstructured information
US20060294101A1 (en) * 2005-06-24 2006-12-28 Content Analyst Company, Llc Multi-strategy document classification system and method
US20070011134A1 (en) 2005-07-05 2007-01-11 Justin Langseth System and method of making unstructured data available to structured data analysis tools
US20070011175A1 (en) 2005-07-05 2007-01-11 Justin Langseth Schema and ETL tools for structured and unstructured data
US7849049B2 (en) 2005-07-05 2010-12-07 Clarabridge, Inc. Schema and ETL tools for structured and unstructured data
US20070009158A1 (en) 2005-07-06 2007-01-11 International Business Machines Corporation Paper and electronic recognizable forms
US20070033229A1 (en) * 2005-08-03 2007-02-08 Ethan Fassett System and method for indexing structured and unstructured audio content
US20070168382A1 (en) 2006-01-03 2007-07-19 Michael Tillberg Document analysis system for integration of paper records into a searchable electronic database
US20070162308A1 (en) 2006-01-11 2007-07-12 Peters James D System and methods for performing distributed transactions
US20070244859A1 (en) * 2006-04-13 2007-10-18 American Chemical Society Method and system for displaying relationship between structured data and unstructured data
US20070282769A1 (en) * 2006-05-10 2007-12-06 Inquira, Inc. Guided navigation system
US8595245B2 (en) * 2006-07-26 2013-11-26 Xerox Corporation Reference resolution for text enrichment and normalization in mining mixed data
US7949629B2 (en) 2006-10-30 2011-05-24 Noblis, Inc. Method and system for personal information extraction and modeling with fully generalized extraction contexts
US20080133455A1 (en) 2006-11-30 2008-06-05 International Business Machines Corporation Method of processing data
US7774301B2 (en) 2006-12-21 2010-08-10 International Business Machines Corporation Use of federation services and transformation services to perform extract, transform, and load (ETL) of unstructured information and associated metadata
US20080154927A1 (en) 2006-12-21 2008-06-26 International Business Machines Corporation Use of federation services and transformation services to perform extract, transform, and load (etl) of unstructured information and associated metadata
US20090164416A1 (en) 2007-12-10 2009-06-25 Aumni Data Inc. Adaptive data classification for data mining
US8140584B2 (en) 2007-12-10 2012-03-20 Aloke Guha Adaptive data classification for data mining
US20090228428A1 (en) * 2008-03-07 2009-09-10 International Business Machines Corporation Solution for augmenting a master data model with relevant data elements extracted from unstructured data sources
US20090248619A1 (en) * 2008-03-31 2009-10-01 International Business Machines Corporation Supporting unified querying over autonomous unstructured and structured databases
US8290951B1 (en) * 2008-07-10 2012-10-16 Bank Of America Corporation Unstructured data integration with a data warehouse
US20100114899A1 (en) * 2008-10-07 2010-05-06 Aloke Guha Method and system for business intelligence analytics on unstructured data
US8250026B2 (en) 2009-03-06 2012-08-21 Peoplechart Corporation Combining medical information captured in structured and unstructured data formats for use or display in a user application, interface, or view
US8572021B2 (en) 2009-03-06 2013-10-29 Peoplechart Corporation Classifying information captured in different formats for search and display in an image-based format

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Gilbert et al. "The New Data Integration Frontier: Unifying Structured and Unstructured Data," ID No. G00138004, 2006 Gartner, Inc. Publication Date; Mach 31, 2006.
Pogue, "Windows XP Home Edition: The Missing Manual, 2nd Edition," 2004, pp. 40-41.
U.S. Appl. No. 12/399,894 filed Mar. 6, 2009, titled "Combining Medical Information Captured in Structured and Unstructured Data Formats for Use or Display in a User Application, Interface, or View".

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11126627B2 (en) 2014-01-14 2021-09-21 Change Healthcare Holdings, Llc System and method for dynamic transactional data streaming
US20150205846A1 (en) * 2014-01-21 2015-07-23 PokitDok, Inc. System and method for dynamic document matching and merging
US10121557B2 (en) * 2014-01-21 2018-11-06 PokitDok, Inc. System and method for dynamic document matching and merging
US10535431B2 (en) 2014-09-17 2020-01-14 Change Healthcare Holdings, Llc System and method for dynamic schedule aggregation
US10007757B2 (en) 2014-09-17 2018-06-26 PokitDok, Inc. System and method for dynamic schedule aggregation
US10417379B2 (en) 2015-01-20 2019-09-17 Change Healthcare Holdings, Llc Health lending system and method using probabilistic graph models
US10474792B2 (en) 2015-05-18 2019-11-12 Change Healthcare Holdings, Llc Dynamic topological system and method for efficient claims processing
US10366204B2 (en) 2015-08-03 2019-07-30 Change Healthcare Holdings, Llc System and method for decentralized autonomous healthcare economy platform
US10013292B2 (en) 2015-10-15 2018-07-03 PokitDok, Inc. System and method for dynamic metadata persistence and correlation on API transactions
WO2017106773A1 (en) * 2015-12-19 2017-06-22 Von Drakk Viktor Method and device for correlating multiple tables in a database environment
AU2016369586B2 (en) * 2015-12-19 2019-03-28 SWVL, Inc. Method and device for correlating multiple tables in a database environment
US10102340B2 (en) 2016-06-06 2018-10-16 PokitDok, Inc. System and method for dynamic healthcare insurance claims decision support
US10108954B2 (en) 2016-06-24 2018-10-23 PokitDok, Inc. System and method for cryptographically verified data driven contracts
US10805072B2 (en) 2017-06-12 2020-10-13 Change Healthcare Holdings, Llc System and method for autonomous dynamic person management
US10922299B2 (en) 2018-04-24 2021-02-16 The Von Drakk Corporation Correlating multiple tables in a non-relational database environment
US11151112B2 (en) 2018-04-24 2021-10-19 The Von Drakk Corporation Correlating multiple tables in a non-relational database environment

Also Published As

Publication number Publication date
US20120290564A1 (en) 2012-11-15
US20140046931A1 (en) 2014-02-13
US20100228721A1 (en) 2010-09-09
US8250026B2 (en) 2012-08-21
US8572021B2 (en) 2013-10-29

Similar Documents

Publication Publication Date Title
US9165045B2 (en) Classifying information captured in different formats for search and display
Rubin et al. iPad: Semantic annotation and markup of radiological images
US20100274584A1 (en) Method and system for presenting and processing multiple text-based medical reports
US20100241657A1 (en) Presentation generator
US8600771B2 (en) Systems and methods for generating a teaching file message
US20160342590A1 (en) Computer-Implemented System And Method For Sorting, Filtering, And Displaying Documents
US9189569B2 (en) Non-transitory computer readable medium, medical record search apparatus, and medical record search method
WO2007120774A2 (en) Method, apparatus and computer-readabele medium to provide customized classification of documents in a file management system
US20150310007A1 (en) Records management system and methods
JP6581087B2 (en) Iterative organization of the medical history section
US20170185718A1 (en) System and Method for Problem List Reconciliation with Care Plan Generation in an Electronic Medical Record
Pinho et al. A multimodal search engine for medical imaging studies
US20080109400A1 (en) Method and device for configuring a variety of medical information
US20110289038A1 (en) Visualization of Data Record Physicality
Pinho et al. Extensible architecture for multimodal information retrieval in medical imaging archives
US20170091886A1 (en) Methods, systems, and computer readable media for optimized case management
US8799326B2 (en) System for managing electronically stored information
Hui et al. HIWAS: enabling technology for analysis of clinical data in XML documents
Wang et al. WikiMed-DE: Constructing a Silver-Standard Dataset for German Biomedical Entity Linking using Wikipedia and Wikidata.
Rosenfeld et al. Current challenges in microbiome metadata collection
Roa-Martínez et al. Digital Image Representation Model Enriched with Semantic Web Technologies: Visual and Non-Visual Information
Schuler et al. An asset management approach to continuous integration of heterogeneous biomedical data
Hazarika et al. DSpace information retrieval system: a study using DICOM metadata standard
Hazarika et al. Developed DICOM standard schema with DSpace
JP4025572B2 (en) Structured document analysis device and method, and storage medium storing structured document analysis program and structured document analysis program

Legal Events

Date Code Title Description
STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2551); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

Year of fee payment: 4

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20231020