WO2001031634A1

WO2001031634A1 - Proofreading system and method

Info

Publication number: WO2001031634A1
Application number: PCT/US2000/029649
Authority: WO
Inventors: David J. Mishelevich; Ted W. Lampher
Original assignee: Qenm.Com, Incorporated
Priority date: 1999-10-28
Filing date: 2000-10-27
Publication date: 2001-05-03
Also published as: AU1238601A

Abstract

A proofreading system and method in which data was orally entered, includes linking the sound file of the oral input data to the text file of the transcribed input data, and presenting the proofreader with audible (312) and visual (302) information. Soundex codes or speech recognition patterns are used to identify potentially incorrect data, and the proofreader is presented with alternatives to choose from. In the case of a structured document having different categories of information, alternatives specific to the particular category are generated. Where predefined portions of data are included, the system identifies such predefined data or automatically skips to the end of such data during proofreading.

Description

PROOFREADING SYSTEM AND METHOD

BACKGROUND OF THE IN ΕNΗON

Field Of The Invention

This invention relates to a system and method of proofreading documents. In particular it relates to a system and method of proofreading documents that have been created usmg oral mput and subsequently transcribed mto text.

Description Of The Prior Art

Traditionally, people record information by means of hand written notes or by entering the information mto a computer usmg standard word processmg software. In the medical environment, healthcare professionals such as physicians and nurses record medical information about patients This will mclude background about the patient, such as the patient's age and sex, patient history, physical examination, and subsequent information that is developed during discussions with the patient

The physician typically requires the patient to fill out a questionnaire disclosing the personal information about the patient, the patient's background, medical information and pre-existing medical conditions. During the patient visit and thereafter the physician usually makes notes. As part of the process, the physician makes a determination regarding the patient's medical condition and defines a treatment plan. Physicians often use a dictation and transcribing approach to record this information. However, the requirements upon physicians are becoming increasingly burdensome and tend to interfere with the physician's activities of examining patients. This applies equally to other applications m which data has to be recorded. Where oral data entry means are adopted, either software or a human bemg has to be employed to transcribe the oral mput data. In either case, the transcribed text data is likely to mclude errors and requires proofreading to render a more accurate textual rendition of the oral mput data

Speech recogmtion software has recently become available for desk top computers, such as Dragon Dictate, and IBM VoiceType. Dragon Dictate can mterface with different applications, including word processors. However, it suffers from the disadvantage that there is no audio recordmg of the dictation stored for replaying during editing. While the IBM VoiceType version retains both a sound and a text file, the text from the speech recogmtion engme is mput directly mto a propπety text processor. U.S. Patent 5,799,273 addresses the issue of allowing audio data to be stored corresponding to the edited text by updating the audio file when the text file is edited. None of the prior art, however, provides a facility for assisting a proofreader, such as highlighting suspicious text portions for the benefit of a proofreader, and no facility exists to associate positional information with the text to ensure that data mput mto a document havmg a predefined format and layout does not get corrupted during editing by inadvertently ending up m a different section of the document. What is needed is a system for substantially increasing the speed and efficiency of the proofreading of speech-

SUBS I I l U lh SHEET (RULE 26) recognition generated documents. In particular, what is needed is a means for mcreasmg the speed and efficiency of proofreading structured speech generated documents.

The present invention seeks to provide an efficient system and method of verifying and editing a structured document where the data was entered usmg oral mput means.

SUMMARY OF THE INVENTION

According to the invention there is provided a method for proofreading a document havmg data that was orally mput mto the document and subsequently transcribed. One embodiment disclosed is a medical application. However, the invention is not so limited. The mput of the mformation by a user preferably mvolves the use of a computer and the use of speech- recognition software to transcribe the oral mput data mto text.

The system and method of the invention preferably mvolves connecting the user to a proofreader located at a remote site, by means of a Local- Area or Wide- Area computer network. The method may be implemented m a client-server system wherem all or some of the data is sent from the client to the server for evaluation. The data may, for example, mclude text that is proofread by a person on the server side or is transfeiTed to a third party proofreader via a server or direcdy from the user who entered the data.

The method may mclude an accuracy monitoring capability to monitor, and possibly, make perceived corrective changes to the entered data. The method may, for instance, mclude a dictionary of undesired terms and/or a dictionary of desired terms which compares the entered data to the dictionary of words or phrases, and highlights and/or offers alternative terms or phrases from which to choose, or corrects perceived mcorrect data. The method typically checks the accuracy of oral data by performing acoustic checks, such as the use of Soundex codes or extended Soundex codes. Extended Soundex coding refers to assigning to the first letter, the category number which Soundex assigns to other letters, rather than usmg the letter itself for the first letter.

The extended Soundex codes can be designed to check not only individual words but also multi-word phrases. The acoustic checkmg may mclude presenting the proofreader with alternative words and/or phrases once words and or phrases are identified as potentially mcorrect.

Typically the data is put mto a document havmg a predefined structure. Thus, data entry may take the form of mput mto a predefined table or form havmg a certain layout. Different forms may be provided depending on the complexity of the case and the nature of the user. Structural information associated with the data may be retamed through the use of a database or, if the data is senalized, by associating tags with the data, thereby retaining the structural information.

The system and method of the invention thus mcludes means for retaining the document structure during editing by a proofreader. The structure allows error checkmg by presenting alternatives to potentially erroneous data, wherem the alternatives are tailored to a given region of text The system and method of the invention mcludes means for accommodating the proofreading of documents where data was entered by way of predefined portions of data, such as text, hereafter referred to as a macro This may be done by marking the location of the macro m the document without inserting the data until after proofreading is complete, or by presenting the proofreader with the entire document mcludmg macros to permit data to be read in context

The method and system can mclude scoring of data m which pomts or another quantitative measuring feature is ascπbed to the data The resultant score may be manipulated, such as by adding the pomts for a section of the document or for the document as a whole, to provide a quantitative measure, facilitating further action or a decision making process, such as supporting a given level of billing.

The pomts or other quantitative measure of the data mput may serve to measure the qualitative and quantitative value of tasks performed by a user For instance, a physician examining a patient, may be given pomts based on the extensiveness of the patient evaluation and the complexity of the diagnosis. He may also be given pomts based on the seventy of the patient's illness or the amount of expertise required of the physician. Macros may cover more than one data entry location and can result m the assignment of pomts or filling m of scores for the associated multiple locations These scores may be predefined or calculated by the system. During proofreading, editing of data may cause the pomts to change Pomt changes are saved m association with the data by retaining structural information about the data.

The method is implemented on a computer system and the system may mclude functionality for prompting the user m order to maximize the pomts. Thus, the system may determine that the number of pomts for a particular section can be increased by entering merely one or two additional categones. By monitoring the accumulated pomts for a particular section or for the document as a whole, and by determining a distance to the next predetermmed threshold level, the system may prompt the user for additional mput. This may be achieved by providing a visual representation of the data collected, e.g., usmg a tree structure.

In order to assign pomts or other quantitative measures to the entered data, each piece of information is predefined as bemg associated with one or more pomts or other quantitative measure. Thus, a single response may mclude more than one element and have more than one pomt attributed to it Further, according to the invention there is provided a proofreading process, wherem data is entered orally usmg speech-recognition software. The process may be implemented on a system that mcludes a spell checker (for data that was entered usmg a keyboard or was entered by specifying the individual letters constituting a word) and/or a grammar checker. The system may also mclude a list of unexpected terms. This allows data corresponding to an unexpected term to be highhghted, alternatives presented, and/or replaced with an alternative Where the teπn or terms are highhghted, a person acting as proofreader will be prompted to verify and, if necessary, correct the data entry.

The system stores the data both as a sound file and as a text file that is created pursuant to conversion by speech-recognition software. The proofreader receives both an audio version of the data and a text version that is synchronized with the audio version. As such, the method envisages the proofreader reviewmg the text visually while simultaneously listening to the audio version The proofreader may mstead of the user, or m addition to the user, have speech-recognition software to transcnbe the audio data. Words or phrases of

SUBSTTTUTΕ SHEET (RULE 26) concern may be visually highlighted by the system m the text version. The audio version may also mclude a sound marker to identify a word or phrase of concern. Thus, the system mcludes transmitting both a sound file and a text file to the proofreader, synchronizing the two files and mcludmg means for highlighting or otherwise emphasizing words or phrases requiring specific attention of the proofreader. The text file may also mclude positional mformation associated with the data to define the location of the text m the document, and facihtate synchronizing the text and the sound data. The text file also mcludes pomts information about the data. Changes made to the data by the proofreader are typically sent back to the user together with positional or structural information, and any pomts changes, to allow the data on the user side to be corrected, and also to teach the speech-recognition software. In cases where data is entered m the form of a macro, the macros may immediately be inserted mto the table, form, or other document and may be demarcated visually m the text version when displayed on a visual output device such as a computer momtor. The macro may also be audibly demarcated m the sound file to indicate to a listener, the beginning and the end of the macro. Alternatively, the insertion pomt of a macro may merely be indicated m the text file without actually inserting the data. It may, thus, be defined by a suitable visual mdicator when viewed on a momtor, for example, by means of a key word. Similarly the insertion pomt of the macro m the sound file may be indicated by an audible tag such as a key word which is audible to the listener. Thus the proofreader need not get mvolved with the macro data, e.g., the wording, m the case of text data, and is merely provided with an indication as to the pomt of insertion of the macro m the rest of the document, or the macro text may be displayed but automatically skipped over. The macros and other data inserted mto the document are associated with tags to ensure that the data may be stored m a database while retaining structural and other information about the data. The proofreader may be presented with some or all of the data, while some or all of the structural or other information (e.g., pomts allocated to the data) may be withheld from the proofreader. Sensitive data may also be withheld from the proofreader or require a password to restnct access. Preferably the playback speed is adjustable by the proofreader, both the audio and text versions working m synchrony with each other to ensure that the text follows the playback of the sound file and vice versa. This is achieved, for example, by placing tags or synchronization markers m the text and sound file, or by mcludmg speech-recognition software and a search engme to locate a portion of the text file corresponding to a portion of the sound file, or by savmg the text and corresponding sound file portions m one table or linked tables of the same database. Thus, the user can fast forward or rewmd through either file or jump around m a file without losmg synchronization.

Revenue streams may be obtained from users of the data entry software by providing the optional proof-reading service that can, conceivably, be performed at a remote low labor cost location. BRIEF DESCRIPTION OF THE DRAWINGS

Figure 1 is a schematic representation of a user mterface, Figure 2 is a simple schematic representation of a data capture system; Figure 3 illustrates a speech-recogmtion system;

Figure 4 is a schematic overview of the internal architecture of a computer system; Figures 5 and 6 illustrate data entry tables used m the invention; Figure 7 is a senalized representation of some of the data of Figures 5 and 6; Figure 8 is a flow diagram illustrating user mput of data and scoring of the data; Figure 9 is a flow diagram illustrating one embodiment of a proofreading process;

Figure 10 shows a user mterface for a proofreader;

Figure 11 is a schematic block diagram showing the use of Soundex codes m the invention; and Figure 12 is a table showing words and phrases, corresponding Soundex codes, and structural information m the form of categones with which the words and phrases are associated.

DETAILED DESCRIPTION OF THE INVENTION

Figure 1 shows a simple user mterface 100 which can be implemented on any system based on a graphic user mterface such as Microsoft Windows, Apple graphic mterface, etc. The top area 102 of the user mterface 100 constitutes the user text entry area. This area is typically provided with a predefined document entry format havmg specific user data entry locations 104. It will be appreciated that a form may have only one data entry location. This may take the form of a table with rows and columns, or a form or template (for example, for a patient encounter with a urology specialist versus a cardiology specialist), or other document havmg predefined sections. The lower part 106 of the user mterface 100 presents the user with data or information retneved by the system. The user mterface 100 may also mclude buttons or icons 108 or a text entry area with the lower region 106 to establish a connection with an information provider or submit a request or inquines such as questions on additional data or requests for samples or a conference call with an expert. The lower area 106 may also mclude a user selection facility 110 m the form of a drop down menu for selecting the user preferred method of mformation presentation. The mterface also incorporates labels 112 which serve to prompt the user for mput. In one embodiment, proposed by the invention, data is entered by means of voice mput m which speech-recogmtion software converts the sound mput mto text format. Text information is then displayed m an appropπate data entry location 104 m the form 100. The data that is entered, as well as any structural information or other information associated with the data, is saved m a database 200, as illustrated m Figure 2. Thus, the data capture system 202 includes a user mterface which receives the user mput, and facilitates the savmg of the information m a database 200. It will be appreciated that m the case of a voice activated system m which data is orally entered mto the system, keywords may be established for use by the user to move around m the document and between the upper and lower portions of the user mterface and to edit any of the data mput or information retneved Thus information entered by the user mto the upper portion 102 as well as mformation specifically requested m the lower portion 106 by the user can be used by the system, for example, a search engme to locate mformation relevant to the user's needs.

The invention envisages providing a vanety of data entry templates, which may be selected from by the user to meet his or her needs. Thus, the templates may be specific to the area of expertise of the user, e.g., urology, and may further mclude more or less complex data entry templates, depending on the number of issues that are to be addressed For instance, a urologist may have several templates to choose from m his field of expertise, depending on whether a particular patient is a first time patient requiring extensive consideration of background mateπal such as family history, or not. The user is provided with buttons or a drop down menu, or other means from which to select an appropnate template.

Figure 3 illustrates a speech recogmtion system that mcludes a personal computer 302 havmg a keyboard 304 for inputting or editing text, and a mouse 306 for manipulating a cursor. The system further mcludes a storage medium such as a floppy disc 308 for loadmg applications mto the computer. In a speech recogmtion environment, the applications would mclude speech recogmtion software that mcludes a speech recogmtion engme, a word processor and an mterface to control the flow of text mto the word processor and the flow of updated information from the word processor back mto the speech recogmtion application to serve as part of the learning curve for the speech recogmtion software.

The system further mcludes a microphone 310, a speaker 312, and an mterface 314 Audio signals from the user are sent from a microphone 310 to the mterface 314 which converts the audio signal mto a digital signal by means of an A - D converter. During playback of the recorded audio signal to the speaker 312, the mterface 314 converts the digital signal back to an analogue signal by means of a D - A converter Figure 4 shows schematic overview of the mtemal architecture of a computer. The computer mcludes a processor 402, a read only memory 404, a random access memory 406, and a bus 408 that links the vanous components to the system. The random access memory (RAM) is used for storing application program systems and data for the speech recogmtion application. It will be appreciated that any application programs can be stored m ROM or another appropnate storage medium. The data for the speech recogmtion application comprises a user model 412 which can be updated to improve the accuracy of speech-recogmtion, a language model 414, and a dictionary 416 to which a user can add new words. The user model 412 mcludes an acoustic model and a contextual model. The acoustic and contextual models are typical components of a speech- recogmtion system, as known m the art, to assist m converting speech mto text. In addition to such known models, the preferred embodiment of the invention proposes the use of Soundex codes, which mvolves retneval of words and/or phrases that sound similar to those orally entered by the user, even though they may have different meanings, e g , meet and meat The present invention mcludes not only the standard dictionary of

SUBSTTTUTE SHEET (RULE 26) desired terms 418 but also a dictionary of undesired terms 420 The system can therefore be set up to be user specific to mclude terms found m the user's particular field of art, storing these terms m the dictionary 418, and storing terms that would typically not be found m the user's area of expertise and storing these m the dictionary 420. The dictionaπes 418, 420 can mclude categones so that alternatives presented can be matched to the context of the discourse. Thus, each of the dictionanes 418, 420 may compnse more than one separate dictionary tailored to different categones of interest, e g., anatomical regions, symptoms, etc The vanous categones may correspond to sections of the document or may be specific to one or more data entry locations It will be appreciated that the vanous models may use the same sets of dictionanes 418 and 420

As the speech recogmtion software converts the user's oral mput mto text a speech recogmtion mterface 422 causes the data to the text processor application 424 to form a document 426 m memory. A display 430 also displays the text of the document 426

The system also mcludes a non-volatile storage medium m the form of disc storage 432 The disc storage 432 mcludes a temporary directory used by the speech recogmtion engme for storing run time files containing the speech recogmtion output data. The disc storage 432 also mcludes a user's directory for storing document files emanating from the word processor and associated link data created by the speech recogmtion mterface.

As is illustrated m Figure 2, captured information is stored on the database 200 m the form of objects. These objects thus have information associated with them such as properties pertaining to the objects and methods associated with the objects. In this manner data that is captured by the user may have structural information and pomts information, discussed m greater detail below, associated with the data. Thus, associated information can readily be saved m a database. In some cases it may be desirable to transfer the data in a flat file format, e.g., if the receiving device has limited memory and cannot support database management software. In such a case, the data and associated information is serialized, as illustrated m Figure 2, to transfer the information retneved by a data capture application 202, to a location remote from the database, such as for data collection. Similarly, data to be placed mto the database, is again, desenahzed and the data stored m the appropπate rows or columns of the database table as defined by a database manager. It will be appreciated that the computers mvolved may take any one of a number of forms, e g laptops, palmtops, Personal Digital Assistants (PDAs), etc.

This is best illustrated m Figures 5, 6 and 7 m which Figure 5 shows a table 500 comprising rows, and columns. Similarly, Figure 6 illustrates a table 600, mcludmg rows and columns. These tables are to a large extent, illustrated m serialized form m Figure 7.

Table 500 identifies, m column 502, the patient identification number (Patientld) Column 504 gives the encounter of the patient, ι.e , when the patient was seen and mcludes date entπes Column 506 identifies vanous element types or families of information which, m one embodiment, correspond to sections of the data entry form. Thus, each patient will have these elements associated with him or her. These mclude PROB relating to the type of problem associated with the patient, e g , asthma, hypertension, etc., which are identified

SUBSTTTUTE SHEET (RULE 26) m column 508 by identification numbers. Column 506 further mcludes elements relating to HPI (history of present illness), ROS (review of system), PFSH (past family and social history), PE (physical exam), CC (chief complaint), MDM (medical decision making), and an element permitting text substitution for a given keyword, identified here as MACRO. Column 510 identifies the order m which information is entered, e.g., CC will be inserted first, then HPI, followed by ROS, etc. Column 512 identifies the complexity of the case, and can be related to the score assigned to a particular task. Column 514 mcludes elements identifying the area of specialty, e.g., urology, or a general area called BASE. Column 516 adds prompt text corresponding to the internal elements of column 506. By usmg a separate column for the prompt text, a system can readily be implemented m any human language without havmg to adjust any of the internal elements. Column 518 elaborates on the type of problem associated with the patient.

Table 600 contains entπes that are children of the entities m Table 500. Each row of Tables 500 corresponds to a separate element type, each element type, e.g., HPI, representing a collection of elements contained m Table 600 that have the same element type. Tables 500 and 600 present information about a number of patients as identified by the Patentld. Tables 500 and 600 represent information about all the patient encounters. All of the elements associated with a particular patient will have the same Patient ID as identified by column 602 of Figure 6. Among the PatientlD records, those representing a given encounter will have the same Encounter ID, as illustrated by column 604. As illustrated m column 606, all of the entnes associated with the Element Type "History of Present Illness" (HPI) from Table 500 are given as Element Type HPI. Column 608, entitled "DisplayName" lists vanous types of HPI by location of the illness, pam quality, pam seventy, etc., and provides the text for the prompts to be displayed or to be spoken by the system to elicit specific information about a patient at a particular visit. Thus, the physician is prompted to identify the location of any pain, the pam quality, the pam seventy, the pam timing, the duration, and other information about the patient's symptoms. The physician enters the data which is stored m column 610. The data type, e.g., text, is given m column 612. In the embodiment shown, column 614 is mcluded to identify an internal code associated with the entered data. A score is assigned whenever data is entered. This may simply be one pomt per data entry or may be weighted to take account of qualitative factors m addition to quantitative factors. For example, the expertise of the physician, the amount of work mvolved m establishing and recordmg the data, and other qualitative and quantitative measures of the data entry can be considered m providing a score for a data entry. This is recorded m column 616. Column 618 identifies the person recording the data and, in the physician example, this can be either the physician or another medical staff member such as a nurse. Once the data entry for a particular row is complete, column 620 identifies the element by a check mark. The user can do this by verbalizing a predefined keyword, e.g., "enter", hitting a button on a keyboard, or any other mput technique. Instead, the user may be given a predefined time penod or system determined time peπod to enter the data, before the system identifies the data entry for that element as bemg complete, and moves to the next data entry location. Typically, the system moves to the next data entry location by prompting the user for data entry at the next location. Even where the time penod for data entry is user determined or determined by the system, based

8.

SUBSTTTUTE SHEET (RULE 26) on past data entry rhythm or speed by the user, it will be appreciated that the timing may vary from one data entry location to the next It will also be appreciated that predefined data, referred to herem as a macro, can cover more than one data entry location and be associated with a score of more than one pomt. Thus, a macro may deal with one or more groups of elements such as HPI or portions thereof, i.e , cover portions of one or more groups, e g , pam quality and pam seventy of HPI, and part or all of ROS Where more than one data entry location are accommodated by a macro entry, the system will typically move to at least the next open data entry location for purposes of prompting the user for data entry.

In one embodiment, the identifier for the macro would appear multiple times m ElementCollection column 518 of Figure 5 Each appearance would have a different ElementType m the ElementType column 506, one for each group of elements (CC, HPI, ROS, PFSH, PE, etc.) that will have one or more elements m the table m Figure 6 Invoking this collection would pull out rows m Table 6 that have a ParentCollection identifier m column 622 that conesponds to the identifier m column 518. These identifiers could represent different categones such as elements from HPI, ROS, PE, or others

An alternative embodiment is shown m the example illustrated m Fig. 5, the last row compπses a MACRO entry defined m column 508 as PREESWL In lmes 3 to 12 of Table 600, PREESWL is identified as compπsmg ten elements. As shown m column 606, these mclude HPI, two entnes for PFSH (one for past history, and one for family history, as shown m column 608), an entry for ROS, four entπes for PE, and two entnes for MDM As shown by column 622, the vanous entnes are assigned a complexity of 2 and are given the text shown m column 610. It will be appreciated that the person defining the complexity can decide on the appropnate value depending on qualitative and/or quantitative factors such as the expertise required of the physician, the difficulty in making the determination, the time mvolved m making the determination, etc. The complexity value given m column 512 of table 500 is used as a weighting factor m this embodiment. Clearly, an embodiment could, instead, have merely one set of complexity values m either table 500 (to define the complexity for an element type) or table 600 (to define complexity on an element by element basis) In yet another embodiment the system could calculate appropπate complexity values as data is entered by the user. This could be based on vanous factors, mcludmg the time taken to enter the data, the expertise of the user, etc. Thus, m one embodiment, the system would provide a user mput location for identifying the user. This would then be stored m column 618 for data entry rows associated with that session.

Figure 7 represents the senalized form of some of the information (not mcludmg information about the person regarding the data) given m tables 500 and 600 It shows the corresponding information m the third to eleventh rows, m which the third row conesponds to the associated information (tenth row) m Table 500 and the fourth to eleventh rows m Fig. 7 represent the coπesponding information (thirteenth to twenty-eighth row) in Table 600 (except the mformation about the person recordmg the data). Thus, by defining the mput data as objects havmg certain properties and methods, not only the data entered by the user, but also information about that data is retamed for purposes of transmission. An example would be "Pam Seventy: Five on a scale often"

SUBSTTTUTE SHEET (RULE 26) where "Pam Seventy would be the verbal tag that would ensure that the data "Five on a scale often" would be coιτectly placed m the document.

It will be appreciated, however, that instead of seπa zing the lnfoimation the associated information may be transmitted by transferring the entire database or a portion thereof An advantage of defining objects in a generalized object table is that properties and methods can be added or subtracted without changing the table structure by havmg to add or subtract columns. The table format retains the necessary associated information, such as, structural information, about the entered data.

As discussed with reference to Figure 1, the data mput document is divided mto sections requiring data entry mto predefined data entry locations withm the sections. However, it will be appreciated, that the entire data mput document, which may take the form of a template or form as illustrated m Figure 1, or may be m the form of a table, may constitute only one section with one or more data entry locations. The data entry locations defined as 102 m Figure 1 are highlighted or otherwise prompted for user mput as illustrated by Block 802 m

Figure 8.

The user response 804 can take vanous foims, mcludmg no response at all, as illustrated by Block 806 m which case the user simply moves the cursor or other prompting feature to the next data entry location m the document. The user response may, instead, merely indicate that conditions are normal as illustrated by Block

808. In case of an abnormality m a patient's condition, the physician may specify the abnormality 810. The physician may insert or specify for insertion, predefined data such as text, which may take the form of a macro as illustrated by Block 812. This is retneved by the user usmg keywords to insert the appropnate data. Instead, the user can be presented with a list of choices from which the user may then select the appropnate data.

The response from 804 together with a score associated with the data is then stored m a local storage mechanism 814, such as the database 200. A decision is made at 816 to determine whether all the data has been entered mto the document. If not, the user is prompted for further mput, 802. When all the data has been entered, the overall result is displayed as indicated by Block 818, and the information sent to a central data storage faculty, as illustrated by Block 820.

As the user enters the data, he may become aware of data entry locations that he has missed and may move the cursor or other prompt to the specified location. This allows the user to jump around m the document and enter his data. In a preferred embodiment, the user is given the ability to indicate at a given pomt that additional text will be dictated later. The additional data may then be entered by the user or a third person at a later time.

In order to allow data to be entered by a user usmg a portable voice recordmg device such as a digital or analog voice recorder the preferred embodiment allows data entry locations to be orally specified by means of keywords. Any data that is entered, that is associated with the keyword, e.g., data that immediately follows a keyword, can subsequently be downloaded mto the system without losmg structural information and processed

10.

SUBSTTTUTE SHEET (RULE 26) m batch mode Thus, the keywords serve as tags that link the subsequent data with particular data entry locations m the document.

Typically, however, the user will adopt a certain data entry sequence that may be momtored by the system and cause the system to automatically prompt the user for data mput at the next system determined location Thus, the system may be set up to automatically move on to the next section after a predefined penod of time, or may momtor the user's mput rate and rhythm to find an appropnate time delay before movmg onto the next data mput area. The rhythm can be established by measuring say one-and-a-half-times the average pause between elements of dictated mput and assuming that if the user dictating has not uttered additional dictation by that time, that he or she does not want to provide further information for that data entry location. In a preferred embodiment, the user is provided with means for adjusting the prompting rate A particular user may habitually enter data mto only part of a form. The method mcludes a system m which data entry is momtored to ascertain a pattern, for example, data entered mto certain rows and columns and leavmg other rows and columns blank. By recordmg the data entry patterns of the last three sets of data entry, for example, the system may automatically highlight, or otherwise prompt, the user to mput data at the next mput location m the pattern. Thus the data entry sequence of column 510 m table 500 is adjustable by the system. The most likely next place of entry reflects what has happened m the immediate past. Thus if an abnormality is recorded in the recent part of the sequence, it is most likely that the next item will be abnormal as well.

In a preferred embodiment pomts or some other scoring is associated with the data mput. A data entry location may mvolve data mput which is associated with one or more pomts depending on the quantity or quality of the work associated with obtaining such information. For example, m a physician/patient environment, the extensiveness of the patient examination and the complexity of the symptoms and expertise required of the physician would be factors considered m determining the appropnate pomts to be allocated to the data required for entry mto a particular data entry location. The system, according to the invention, also has the ability to anthmetically manipulate the scores, for example, by adding the pomts to determine a total score for a section or for the entire document. In a preferred embodiment, predetermmed threshold levels are established. If it is determined by the system that the score for a certain section or for the document as a whole is withm a predefined range of the next threshold level for that section or document, it prompts the user for further data mput mto data entry locations determined by the system to bring the total score up to the next higher level. For example, if a urologist dealmg with the symptoms section of his examination accumulates a score of 18 pomts, and the system determmes that a mmor additional exammation would take his pomt level up to the next higher score of 20, it could prompt the urologist for the appropπate additional mput In a preferred embodiment, he user is presented with a visual representation or depictions of the data entry locations that have data m them. This can take the form of one or more tree structure from which the user may select, thereby allowing him to view the information m the appropπate amount of detail. The visual depictions can differ m the amount of entered data they display, or the degree to which individual data enty locations, as opposed to

11.

SUBSTTTUTE SHEET (RULE 26) sections or groups of data entry locations are depicted. The breakdown of points can thus be readily ascertained and appropriate extra data entered to increase the score.

In order to improve the quality of the data entry and avoid unnecessary enors, the system includes a dictionary of undesired terms and/or a dictionary of desired terms directed toward the type of data entry of the particular user. For example, a urologist will typically use a specific set of terms and phrases in defining patient symptoms. The data input is compared to the dictionary terms, and any data input that falls outside the parameters of the dictionary of desired terms (which includes standard words such as "and" and "the") or within the undesired terms is highlighted. For example, the text is bolded or the color changed or other highlighting feature used, to draw the attention of the user to the suspicious data entry. The user may also be presented with alternative options that he or she may then enter or select from. Information is also fed back to the speech- recognition software as part of the learning mechanism of the software. Clearly, where data is input by the user using an input mechanism other than oral input, a dictionary of terms may still be included to verify data input. Spell checkers and grammar checkers may also be included to improve the quality of the data input. It will be appreciated that any prompting of the user for selection of an alternative or highlighting of a suspicious data entry, is not limited to visual prompting but could equally well include an audio prompt.

Input by the user may be achieved by any number of known methods, for example, using a keyboard, a touch sensitive screen, making use of character recognition software, etc. In the preferred embodiment data is, however, entered by vocal commands interpreted by speech-recognition software, which converts the oral data from the user into text and stores it as a text file. A number of speech recognition systems are currently on the market for desktop computers, including Dragon Dictate and IBM VoiceType. Preferably, the oral input is also saved as a sound file to assist in subsequent proofreading, as discussed in greater detail below.

In order to give the user feedback on the data he enters, he is provided with a display that displays the data entry table or form and any entered data. Depending on the user input screen chosen, the output, whether displayed or printed, will vary. One output is a transcription that looks like a document as it would normally appear (e.g., a medical chart). Another output is comprised of the filled out template elements including or not including the associated scores, perhaps with each new element on a separate line.

In one embodiment the user can be provided with an "I accept" button which allows the content of a given encounter/session to be locked. This output may be preceded by a processing phase in which the output of the speech recognition can be refined based on the additional contextual information (namely the whole document). One way to accomplish this would be to record all the decisions as they were made and review them in the light of the overall document by monitoring which contextual dictionaries or lexicons were used during completion of the form, identifying inconsistencies in the group of lexicons, and, by using appropriate substitution lexicons, revising words that were previously entered into sections based on the inconsistent lexicons. As pointed out in Patent 5,799,273, it is advantageous to provide both a sound file and a text file, to assist in subsequent proofreading, and to link the two so that positional changes of characters are monitored and

12.

SUBSTTTUTE SHEET (RULE 26) linked and audio components are updated to allow playback of the sound file m synchronicity with the text file. One way of achieving this is descπbed m the '273 Patent usmg link means for forming link data between the audio portions and the conespondmg character strings.

Thus, m the preferred embodiment, m which data is entered orally usmg speech-recogmtion software, both the text file and the ongmal sound file are transmitted to the proofreader to allow the proofreader to view the text and listen to the oral sound file conespondmg to the text. It is contemplated, m one embodiment, that the data entry system be linked by a network, such as an intranet extranet, or the Internet to the proofreader who may be located m a different state or country The sound and text files are synchronized with one another to allow the proofreader to simultaneously listen to the voice data while viewing the text data. The proofreader is also given the ability to jump forward or backward m the sound file and have the text portion track the sound file m synchronized fashion. Similarly, the user may scroll up or down m the text file and have the location m the sound file move backward or forward m synchronicity with the text file. This can be achieved m a number of ways, mcludmg the use of tags or synchronization markers m the text and sound files or by mcludmg speech- recogmtion software to momtor the next smppet of information on the sound file and converting this to text, thereafter locating the conespondmg portion of the text file by means of a search engme as known m the art. In the case where tags are used, the tags m the sound file can compπse sound signals conespondmg to the vanous data entry locations. In order to avoid distracting a proofreader during the proofreading process, the sound signals may be at a frequency above human speech and can be filtered out usmg a low-pass filter. Instead, a frequency range can be chosen for the signals, that is outside the range detectable by the human ear. Another way to synchronize the files would be to place the beginning and perhaps end times for the speech related to a given category mto a property of that object. For example, if History of Present Illness\Paιn Seventy began at 11.23:06 and ended at 11.23.11, those two times would appear as properties of that response m the record and would be available as tags m the text bemg proofread so allow synchronization with the elapsed time of the dictated speech. In yet another embodiment the sound file and text file can be stored m the same table of a database or m linked tables of the same database. Thus the portion of the sound file conespondmg to a section of text for one or more data entry locations can be saved in the database with the text data. For example, table 600 could merely be amplified to mclude a column for the sound file portions.

Information about a document as opposed to the actual data constituting the document may be incorporated by way of tags as is known and descπbed m GML, ISO/ISC8879. Thus, by adding tags of a format control language such as XML, TeX or GML, abstract format information can be added to a document, thereby allowing the layout of the document to be changed without losmg formatting. Thus, layout information can be changed, such as changing the column arrangements, without compromising the logical structure.

In the case of oral data mput, the user typically has speech-recogmtion software on his own computer, thereby givmg the user visual feedback on his data entry One embodiment of the invention, however, contemplates the proofreader havmg additional, more powerful speech-recogmtion software, and usmg the sound file to create a higher quality text file for purposes of proofreading and conection. Another embodiment contemplates using more powerful hardware and speech-recognition software system to preprocess the dictation prior to its reaching the proofreader.

One embodiment of the proofreading process is illustrated in Figure 9 in which the input data stream 902 optionally is preprocessed in block 904, either on the user side or on the proofreader side, or both, and special words or phrases are marked. These special words or phrases may automatically be conected by the system, or are conected by the user or proofreader. Instead, the system may present the user or proofreader with alternatives for replacing the suspicious data. Any preprocessing of Block 904 may involve input from spell checkers (insofar as data was spelled out to the speech-recognition software or was entered by other means that could introduce spelling enors) and grammar checkers, monitoring the input into the data file. In Block 906, the text file and sound file 908 are played back in synchronized form. Further input verification and improvement takes place in step 910 in which the speech recognition software, either on the user side or the proofreader side, causes alternatives to words that are potentially enoneous to be displayed. Dictionaries may be set up for the particular user's needs. The dictionaries may include separate one word, two word, three word, etc., dictionaries to propose alternatives for various lengths of phrases. Thus, different lengths of text can be highhghted (e.g., reverse video, font style, font color, field background color). Proposed alternatives of different lengths can be presented in a single list, in lists of defined mixtures of lengths (e.g., two lists, one with phrases with three words or fewer and one with phrases of four words or more) or may appear in separate lists, each containing phrases of the same number of words. In one embodiment the proofreader can select among alternative words/phrases or perform other functions such as increasing or decreasing audio playback speed by using voice commands, tactile controls (e.g., computer-game joy sticks, including additional buttons, or other platforms), eye movements or blinks, foot pedals, muscle twitches, and other bionic inputs.

In addition to software mechanisms for improving the quality of the data input the proofreader edits the data by taking note of highhghted sections and by reading the document in context to ensure consistency. In the event that the proofreader is not in a position to make a determination, one embodiment of the invention proposes the use of experts, such as oversight work stations, to make a final decision. In a situation where an oversight work station is not available or is also not in a position to make a final decision, the region of text in issue is marked and returned to the originating organization, namely, the user for appropriate selection or conection. This is indicated by Block 912. Thus, the proofreader or oversight work station or user may cause text to be conected. As illustrated in Block 914, the conected text is saved into the database or serialized document as discussed with reference to Figure 2 above. The conected structured or unstructured file, or only the portions that were changed, is then returned to the user in Block 916 and updates the text in the patient- encounter record. In the case of a structured document positional or structural information is sent back with the data to permit the appropriate data locations on the user side, e.g., in the user database, to be conected, and any changes in scoring, to be reflected on the user side.

14.

SUBSTTTUTE SHEET (RULE 26) Thus, the changes in the text may result in different information being put into the data entry locations. This may affect the scoring associated with such data. In one embodiment, it is contemplated that the user, upon receiving the returned data file, is again prompted for additional data input where the score for a certain section or for the entire document is within a predetermined desired point range. If the user is not inclined to make further changes once it is returned to him by the proofreader, he may, at least take note of areas where he could have gained further points, for future data entry sessions.

The feedback from the proofreader in the form of the selection of particular options of text are also returned together with the audio snippets for purposes of retraining the speech recognition software, as is illustrated by Blocks 918 and 920. Final files or the changed portions of the files are returned to the user for replacing the original files or file portions (both continuous text output and individual structured elements in the database as illustrated by Block 922.)

An embodiment of the user interface for transcription proofreading is shown in Figure 10. In this example, only one type of alternative is presented, that for the single word, but other display boxes displaying phrases of different lengths can easily by displayed. The text being proofread is displayed in text box 1002. Its height and width can be varied using controls 1004 and 1006 for height and 1008 and 1010 for width to adjust the display for individual proofreader preference. Each word or phrase of the text in 1002 can be highlighted, say the word "damning" to draw the attention to the proofreader to the cunent point of action. In a prefened embodiment the document has predefined regions having lexicons associated with the regions that are specific to the topic covered by the region. The application alternatives are determined (at least in terms of priority of presentation) by the categories that the text being analyzed satisfies, if the text being analyzed has been so structured. The applicable alternative(s) at that point of text is displayed in a list box 1012 with the identification number(s) 1014 of the altemative(s) listed to its left. At each point where an alternative is to be considered, a dialogue box 1016 is presented in which the identification number of the desired selected alternative is input. A non-selection can be indicated by putting in a special number, say 0, or other character. Alternatively, the interface can be set up so the selections are made using a point-and-click mechanism. In addition, if that particular word (or phrase) is to be replaced by text not shown as an alternative or altematives have not been presented, the appropriate button 1018 is pressed and a dialogue box brought up in which the replacement text can be input. In a prefened embodiment, as an aid to the proofreader, certain parameters are displayed such as the cunent word count 1020, the elapsed time 1022, and the current average words per minute 1024 for processed input already proofread. The proofreader may adjust the scan rate using buttons 1026.

Thus, the proofreader can select among alternative words/phrases or perform other functions such as increasing or decreasing audio playback speed. These selections may be made by using voice commands, tactile controls (e.g., computer-game joy sticks, including additional buttons, or other platforms), eye movements or blinks, foot pedals, muscle twitches, and other bionic inputs.

15.

SUBSTTTUTE SHEET (RULE 26) As illustrated by Figure 11, Soundex codes may be used to send phonetically similar codes as alternatives for selection by the user and/or the proofreader Soundex words and phrases are stored as illustrated m Block 1102. These words emanate from predefined words categorized mto certam categones. For example, the Soundex files may be categorized mto vanous section such as patient history, cardiovascular section, heart, thorax, and an unspecified section, as illustrated by Block 1104. The categones may be organized hierarchically so that selections can be made based on the desired scope. Data is mput mto the vanous categones as illustrated by Block 1106. Resultant categones with their data are then stored m a storage medium 1008. Alternatively, categorized voice-recognition segments may be used to generate and display text alternatives. Figure 12 illustrates a data base implementing the Soundex codes. Typical Soundex codes take the form of a letter followed by several numbers, e.g., five numbers, where the letter conesponds to the first letter of a word and the numbers depict the consonants m the word following the first letter. Consonants that sound the same such as "c" or "k" are given the same number, and double consonants, such as "U" are treated as a smgle consonant. In order to take account of possible enors m the first letter, extended Soundex cards may be used. In Table 1200, depicted m Figure 12, column 1202 shows extended Soundex codes made up of 19 numbers that represent the words or phrases m column 1204. Instead of retaining the first letter of the word they depict these extended Soundex codes codify the first letter of the word m the same manner as the rest of the letters. As shown m 1202, codes represent the words or phrases and any unused portion of the 19 digits is filled with zeroes. As mentioned with respect to Figure 11, the Soundex files of the prefened embodiment are categorized into vanous sections which are depicted m Figure 12 by columns 1206, 1208, 1210, 1212, and 1214 Column 1216 shows an alternative implementation of Soundex codes conespondmg to the words and phrases m column 1204.

The invention has been descπbed with reference to an example mvolvmg a physician and patient. It will be appreciated that the invention is equally apphcable to any data capture and proofreading environment not just medical. Furthermore, the reference to speech-recogmtion software is not to be seen as the only way in which data can be captured. Thus the system and method as defined by the claims of the invention is not to be read as limited to specific embodiments descnbed m the specification.

16.

SUBSTTTUTE SHEET (RULE 26)

Claims

What is claimed is:

1 A method of proofreading the text data of a document in which the data was entered orally by a user and transcπbed mto text form by speech-recogmtion software on a user computer system, compπsmg comparing the oral mput data or the transcπbed text data to at least one dictionary of terms to identify potentially mconectly transcnbed data, identifying the potentially mconectly transcπbed data, linkmg the oral mput data and the transcnbed text data, reviewmg the text data m conjunction with the oral mput data, and making coπections to the text data 2 The method of Claim 1, wherem the dictionary of terms mcludes at least one of, a dictionary of desired terms compnsmg terms likely to be found m the document and a dictionary of undesired terms compnsmg terms unlikely to be found in the document.

3 The method of Claim 1, wherem the terms mclude at least one of, individual words, and phrases.

4 The method of Claim 1 , wherem the identifying mcludes at least one of, audibly highlighting the oral data, and visually highlighting the text data.

5. The method of Claim 4, wherem the visual highlighting mcludes at least one of, holding the text, changmg the font of the text changmg the color of the text or its background, and presentmg alternative text options for replacmg the mconect data.

6. The method of Claim 1 , wherem the text data is stored m a text file, and the oral data is stored in a sound file, and the linkmg synchronizes the text file and the sound file to allow a proofreader to scroll up and down m the text file while ensuring a conespondmg position change m the sound, and allows the proofreader to move around m the sound file while ensuring that the location m the text file is adjusted accordingly

7. The method of Claim 1, wherem the data mcludes predefined data inserted mto the document and wherem the predefined data is identified by at least one of, visual mdicators m the display of the text file, and audible mdicators m the playback of the sound file.

8. The method of Claim 7, wherem the visual mdicators mclude at least one of, visual markers at the beginning and end of the inserted predefined data, different font for the inserted predefined data, and different color for the inserted predefined data or its background.

9. The method of Claim 7, wherem the audible mdicators mclude at least one of, sounds or words identifying the beginning and end of the inserted predefined data

10. The method of Claim 1, wherem the data mcludes predefined data inserted mto the document and wherem the proofreader uses a computer system to edit the text data, and wherem predefined data is skipped over during editing.

17

SUBSTTTUTE SHEET (RULE 26)

11 The method of Claim 1, wherem the text data is stored m a text file, and the oral data is stored m a sound file, and wherem predefined data specified by the user for msertion mto the document are marked m at least one of the text file and the sound file, for later msertion

12. The method of Claim 1, wherem the proofreading is done by a proofreader at a location remote from the user and the proofreader does the proof readmg usmg a computer linked by at least one network to the user computer system.

13. The method of Claim 1, wherem identifying potentially mconectly transcπbed data mcludes presenting altematives.

14 The method of Claim 13, wherem the comparing is performed by the computer system usmg acoustic analysis

15. The method of Claim 13, wherem the alternatives are generated based on speech-recogmtion segments.

16. The method of Claim 14, wherem the alternatives are generated based on Soundex codes.

17. The method of Claim 13, wherem the document mcludes a plurality of regions dealing with different categones of information.

18 The method of Claim 17, wherem the altematives bemg presented are specific to the categorization of the text.

19. The method of Claim 1, wherem a proofreader has a second, more powerful, speech-recogmtion software.

20. The method of Claim 17, wherem structural information defined by the plurality of regions is associated with the text data.

21. The method of Claim 17, wherem the data mcludes predefined data inserted mto at least one of the predefined regions

22. A method of improving the accuracy of the text data of a structured document m which the data was entered orally and transcnbed mto text form by voice recogmtion software, compnsmg- associating the text data with positional information; linkmg the oral mput data and the transcnbed text data, reviewing the text data m conjunction with the oral mput data, and making corrections to the text data.

23. The method of Claim 22, wherem the text data is stored m a text file, and the oral data is stored m a sound file, and the linkmg synchronizes the text file and the sound file to allow a proofreader to scroll up and down m the text file while ensurmg a conespondmg position change m the sound, and allows the proofreader to move around m the sound file while ensurmg that the location m the text file is adjusted accordingly

18.

SUBSTTTUTE SHEET (RULE 26)

24. The method of Claim 22, wherein the text data includes predefined data inserted into the document and wherein the predefined data is identified by at least one of, visual indicators in the display of the text file, and audible indicators in the playback of the sound file.

25. The method of Claim 24, wherein the visual indicators include at least one of, visual markers at the beginning and end of the inserted predefined data, different font for the inserted predefined data, and different color for the inserted predefined data or its background.

26. A method of Claim 24, wherein the audible indicators include at least one of, sounds or words identifying the beginning and end of the inserted predefined data.

27. A method of Claim 22, wherein the text data is stored in a text file, and the oral data is stored in a sound file, and wherein predefined data specified by the user for insertion into the document is marked in at least one of the text file and the sound file, for later insertion.

28. The method of Claim 22, wherein the reviewing is done by a proofreader at a location remote from the user and the proofreader does the reviewing using a computer linked by at least one network to the user computer.

29. The method of Claim 28, wherein the data includes predefined data inserted into the document and wherein the proofreader uses a computer system to edit the text data, and wherein predefined data is skipped over during review.

30. The method of Claim 22, further comprising identifying potentially incoπectly transcribed data.

31. The method of Claim 30, wherein the identifying includes using acoustic analysis.

32. The method of Claim 30, wherein the identifying includes presenting alternatives.

33. The method of Claim 32, wherein the alternatives are generated based on speech-recognition segments.

34. The method of Claim 32, wherein the alternatives are generated based on Soundex codes.

35. The method of Claim 32, wherein the document includes a plurality of regions dealing with different categories of information.

36. The method of Claim 35, wherein the alternatives being presented are based on the categorization of the text to be analyzed on the basis of one or more categories.

37. The method of Claim 23, wherein the proofreader has a second, more powerful, speech- recogmtion software.

38. A system for creating a text document from a sound file containing oral data, comprising: speech-recognition software to transcribe the oral data into text data; means for checking the accuracy of the transcribed data, which includes means for generating at least one of, words and phrases that are acoustically similar to words or phrases in the oral data, and means for presenting alternative words or phrases for user selection.

19.

SUBSTTTUTE SHEET (RULE 26)

39. A system for proofreading and editing text data where the text data was generated from oral input data, using speech-recognition software, comprising: means for checking the accuracy of the text data, which includes means for generating at least one of, words and phrases that are acoustically similar to words or phrases in the oral data, and means for presenting alternative words or phrases for user selection.

40. The system of Claim 39, wherein the text data is stored in a text file, and the oral data is stored in a sound file, and wherein the text file and sound file are linked to permit a proofreader to view the text data and hear the conespondmg oral data.

41. The system of Claim 39, wherein the oral data was input into a document having a predefined structure with a plurality of data entry regions dealing with different categories of information, and wherein the means for generating at least one of, words and phrases tailors the words or phrases to each region's category.

20.

SUBSTTTUTE SHEET (RULE 26)