WO2007048053A1 - Procede et dispositif permettant d'ameliorer la precision de transcription dans un logiciel de reconnaissance vocale - Google Patents

Procede et dispositif permettant d'ameliorer la precision de transcription dans un logiciel de reconnaissance vocale Download PDF

Info

Publication number
WO2007048053A1
WO2007048053A1 PCT/US2006/041357 US2006041357W WO2007048053A1 WO 2007048053 A1 WO2007048053 A1 WO 2007048053A1 US 2006041357 W US2006041357 W US 2006041357W WO 2007048053 A1 WO2007048053 A1 WO 2007048053A1
Authority
WO
WIPO (PCT)
Prior art keywords
vocabulary
match
elements
database
speech recognition
Prior art date
Application number
PCT/US2006/041357
Other languages
English (en)
Inventor
Robert E. Coifman
Original Assignee
Coifman Robert E
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US11/510,435 external-priority patent/US7809565B2/en
Application filed by Coifman Robert E filed Critical Coifman Robert E
Priority to EP06826505A priority Critical patent/EP1946292A1/fr
Publication of WO2007048053A1 publication Critical patent/WO2007048053A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/32Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
    • G10L2015/228Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context

Definitions

  • Speech recognition systems particularly computer-based speech recognition systems
  • Numerous inventions and voice transcription technologies have been developed to address various problems within speech recognition systems.
  • advanced mathematics and processing algorithms have been developed to address the needs of translating vocal input into computer text through speech parsing, phoneme identification and database matching of the input speech so as to accurately transcribe the speech into text.
  • General speech recognition databases are also well known.
  • U.S. Patent No. 6,631,348 discloses a speech recognition system in which vocal training information is provided to create different vocal reference patterns under different ambient noise levels. The Wymore invention creates a database of captured speech from this training input.
  • a user of the Wymore system may then dictate speech under various ambient noise conditions and the speech recognition system properly filters the noise from the user's input speech based on the different stored models to determine the appropriate, spoken words, thereby improving the accuracy of the speech transcription.
  • U.S. Patent No. 6,662,160 (Chien et al.) also discloses a system involving adaptive speech recognition methods that include noise compensation. Like Wymore, the system of Chien et al. neutralizes noise associated with input speech through the use of preprocessed training input. Chien et al. employs complex statistical mathematical models (e.g.
  • Ballard et al. discloses a computer having a windows-based, graphical user interface that displays the list of potentially intended words from which the user selects the appropriate word with a graphical input device, such as a computer mouse.
  • U.S. Patent No. 6,490,557 discloses a system and method for recognizing and transcribing continuous speech in real time.
  • the disclosed speech recognition system includes multiple, geographically distributed, computer systems connected by high speed links.
  • a portion of the disclosed computer system is responsible for preprocessing continuous speech input, such as filtering any background noise provided during the speech input, and subsequently converting the resultant speech signals into digital format.
  • the digital signals are then transcribed into word lists upon which automatic speech recognition components operate.
  • Jeppeson's speech recognition system is also trainable so as to accommodate more than one type of voice input, including vocal input containing different accents and dialects.
  • this speech recognition system is capable of recognizing large vocabulary, continuous speech input in a consistent and reliable manner, particularly, speech that involves variable input rates and different dialects and accents.
  • Jeppesen further discloses systems having on-site data storage (at the site of the speech input) and off-site data storage which stores the databases of transcribed words.
  • a primary advantage of Jeppesen is that a database of large scale vocabularies containing speech dictations is distributed across different geographical areas such that users employing dialects and accents within a particular country or portion of the world would be able to use localized databases to accurately transcribe their speech input.
  • Thelan et al. begins with an ultra-large vocabulary and narrows the text selection vocabularies depending on the speech input so as to select further refined vocabularies that provide greater transcription accuracy.
  • Model selectors are operative within Thelan et al. to enable the recognition of more specific models if the specific models obtain good recognition results. These specific models may then be used as replacement for the more generic vocabulary model.
  • Jeppesen Thelan et al. discloses computer-based speech recognition system having potentially distributed vocabulary databases.
  • a method for improving the accuracy of a computerized, speech recognition system includes loading a specified vocabulary into computer storage, the specified vocabulary associated with a specific context; accepting a user's voice input into the speech recognition system; evaluating the user's voice input with data values from the specified vocabulary according to an evaluation criterion; selecting a particular data value as an input into a computerized form field if the evaluation criterion is met; and if the user's voice input does not meet the evaluation criterion, selecting a data value from the base vocabulary as an input into the computerized form field.
  • the method further includes evaluating the user's voice input with data values from the base vocabulary according to a base evaluation criterion if the user's voice input does not meet the evaluation criterion.
  • the evaluation criterion is a use weighting associated with the data values.
  • the step of evaluating further includes the step of applying a matching heuristic against a known threshold.
  • the step of applying a matching heuristic further includes a step of comparing the user's voice input to a threshold probability of matching an acoustic model derived from the specified vocabulary.
  • the context is associated with any one or more of the following: a topical subject, a specific user, and a context are associated with a field.
  • a method for improving the accuracy of a computerized, speech recognition system includes the steps of loading a first specified vocabulary into computer storage, the first specified vocabulary associated with a first computerized form field; accepting a user's voice input into the speech recognition system; evaluating the user's voice input with data values from the first specified vocabulary according to an evaluation criterion; selecting a particular data value as input into the first computerized form field if the user's voice input meets the evaluation criterion; loading a second specified vocabulary into computer storage, the second specified vocabulary associated with a second computerized form field; accepting a user's voice input into the speech recognition system; evaluating the user's voice input with against data values from the specified vocabulary according to an evaluation criterion; and selecting a particular data value as input into a second computerized form field if the user's voice input meets the evaluation criterion.
  • the evaluation criterion for the steps of evaluating the first and the second specified vocabularies are the same. In another aspect, the evaluation criterion for the steps of evaluating the first and the second specified vocabularies are different criterion. In still another aspect, the first and second computerized form fields are associated with different fields of a computerized medical form.
  • the present invention provides a method for improving the accuracy of a computerized, speech recognition system that includes loading a first specified vocabulary into computer storage, the first specified vocabulary associated with a first user of the speech recognition system; accepting the first user's voice input into the speech recognition system; evaluating the first user's voice input with data values from the first specified vocabulary according to an evaluation criterion; selecting a particular data value as an input into a computerized form field if the first user's voice input meets the evaluation criterion; loading a second specified vocabulary into computer storage, the second specified vocabulary associated with a second user of the speech recognition system; accepting a second user's voice input into the speech recognition system; evaluating the second user's voice input with data values from the specified vocabulary according to an evaluation criterion; and selecting a particular data value as an input into the computerized form field if the second user's voice input meets the evaluation criterion.
  • the first and second users of the speech recognition system are different doctors and the computerized form fields are associated with a field within a computerized medical form.
  • a method for improving the accuracy of a computerized, speech recognition system that includes loading a first specified vocabulary into computer storage, the first specified vocabulary associated with a first context used within the speech recognition system; accepting a user's voice input into the speech recognition system; evaluating the user's voice input with data values from the first specified vocabulary according to an evaluation criterion; selecting a particular data value as an input into a computerized form field if the user's voice input meets the evaluation criterion; loading a second specified vocabulary into computer storage, the second specified vocabulary associated with a second context used within the speech recognition system; accepting the user's voice input into the speech recognition system; evaluating the user's voice input with data values from the specified vocabulary according to an evaluation criterion; and selecting a particular data value as an input into the computerized form field if the user's voice input
  • a computerized speech recognition system including a computerized form including at least one computerized form field; a first vocabulary database containing data entries for the computerized form field, the first vocabulary associated with a specific criterion; a second vocabulary database containing data entries for the data field; and an input for accepting a user's vocal input, the vocal input being compared to the first vocabulary as a first pass in selecting an input for the computerized form field, and the vocal input being compared to the second vocabulary as a second pass in selecting an input for the computerized form field.
  • the criterion is one ore more of the following: a topical context, a specific user of the speech recognition system, a form field.
  • the first vocabulary database is a subset of the second vocabulary database.
  • a database of data values for use in a computerized speech recognition system including a first vocabulary database containing data entries for a computerized form including at least one computerized form field, the first vocabulary associated with a specific criterion; and a second vocabulary database containing data entries for the data field.
  • the criterion is one or more of the following: a topical context, a specific user of the speech recognition system, a field.
  • the method includes a process of vocabulary element matching including the steps of loading a first vocabulary; evaluating individual vocabulary elements within the first vocabulary to determine a first vocabulary match set, each vocabulary element within the first vocabulary match set having a match probability score; weighting the match probability scores of the vocabulary elements within the first vocabulary match set with a first vocabulary weighting factor; loading a second vocabulary; evaluating individual vocabulary elements within the second vocabulary to determine a second vocabulary match set, each vocabulary element within the second vocabulary match set having a match probability score; combining the individual vocabulary elements within the first and second vocabulary match sets so as to create a combine set of vocabulary elements; weighting the match probability scores of the combine set of vocabulary elements with a second vocabulary weighting factor; and selecting as a match to an input to the computerized speech recognition system a vocabulary element from the combine set of vocabulary elements based on the weighted match probability scores of the combine set of vocabulary elements.
  • the enhanced method may also include the steps of reducing a size of the combine set of vocabulary elements to create a reduced combine set of vocabulary elements or the steps of loading a third vocabulary; evaluating individual vocabulary elements within the third vocabulary to determine a third vocabulary match set, each vocabulary element within the third vocabulary match set having a match probability score; and combining the individual vocabulary elements with the combined set of vocabulary elements so as to create a new combine set of vocabulary elements.
  • the first and second weighting functions are linear scaling factors and the step of weighting includes the step of multiplying the match probability score by the linear scaling factors or the first and second weighting functions are non-linear scaling factors and the step of weighting includes the step of applying the non-linear scaling factor to the match probability score.
  • the first and second vocabularies may be selected based on the previously input text of a user of the speech recognition system and/or the previously input text used in a particular form field being populated by the speech recognition system, or are selected according to a speech context being used by a user of the speech recognition system, or any combination of these or other criteria.
  • the method includes loading a first vocabulary; evaluating individual vocabulary elements within the first vocabulary to determine a first vocabulary match set, each vocabulary element within the first vocabulary match set having a match probability score; loading a second vocabulary; evaluating individual vocabulary elements within the second vocabulary to determine a second vocabulary match set, each vocabulary element within the second vocabulary match set having a match probability score; combining the individual vocabulary elements within the first and second vocabulary match sets so as to create a combine set of vocabulary elements; weighting the match probability scores of the combine set of vocabulary elements with a non-linear vocabulary weighting function; evaluating individual vocabulary elements within the combined set of vocabulary elements to determine a combined vocabulary match set based on the non-linearly weighted match probability scores of the vocabulary element within combined set of vocabulary elements; and selecting as a match to an input to the computerized speech recognition system a vocabulary element from the combine set of vocabulary elements based on the weighted match probability scores of the combine set of vocabulary elements.
  • the enhanced method may also including the steps of applying the non-linear weighting function to the match probability scores of the vocabulary elements within the first and second vocabulary match sets; calculating a first altered match probability score for the vocabulary elements within the first vocabulary match set; deriving a second altered match probability score for the vocabulary elements within the second vocabulary match set; and deriving modified first and second match probability scores for the vocabulary elements within the combined set of vocabulary elements.
  • FIG. 1 is a general network diagram of the computerized speech recognition system according to one embodiment of the present invention
  • FIG. 2 is a system architecture diagram of a speech recognition system according to one embodiment of the present invention
  • FIG. 3 shows an arrangement of a graphical user interface display and associated data bases according to one embodiment of the present invention
  • FIG. 4 is a graphical depiction of different text string database organizations according to one embodiment of the present invention.
  • FIG. 5 is a graphical depiction of one specific, text string database according to one embodiment of the present invention.
  • FIG. 6 is a graphical depiction of another specific, text string database according to one embodiment of the present invention.
  • FIG. 7 is a process flow diagram for the speech recognition system according to one embodiment of the present invention.
  • FIG. 8 is another process flow diagram for the speech recognition system according to another embodiment of the present invention.
  • FIG. 9 is another process flow diagram for the speech recognition system according to another embodiment of the present invention.
  • FIG. 10 is another process flow diagram for the speech recognition system according to another embodiment of the present invention.
  • FIG. 11 is an exemplary vocabulary element data record according to another embodiment of the present invention.
  • FIG. 12 is an exemplary software system and data organization for the record database and associated software according to another embodiment of the present invention.
  • FIG. 13 is a data structure for the input and modified vocabulary element weightings according to another aspect of the present invention.
  • FIG. 14 is another process flow diagram for the speech recognition system according to another embodiment of the present invention.
  • FIG. 1 shows a general office environment including a distributed computer network for implementing the present invention according to one embodiment thereof.
  • Medical office 100 includes computer system 105 that is running speech recognition software, microphone input 110 and associated databases and memory storage 115.
  • the computerized system within office 1 may be used for multiple purposes within that office, one of which may be the transcription of dictation related to the use of certain medical forms within that office.
  • Office 1 and its computer system(s) may be connected via a link 130 to the internet in general, 140.
  • This link may include any know or future devised connection technology including, but not limited to broadband connections, narrow band connections and/or wireless connections.
  • Other medical offices for example offices 2 through N, 151-153, may also be connected to one another and/or to the internet via data links 140 and thus to office 1.
  • Each of the other medical offices may contain similar computer equipment, including computer equipment running speech recognition software, microphones, and databases.
  • Also connected to internet 140 via data link 162 is data storage facility 170 containing one or more speech recognition databases for use with the present invention.
  • Fig. 2 provides a diagram of a high-level system architecture for the speech recognition system 200 according to one embodiment of the present invention. It should be recognized that any one of the individual pieces and/or subsets of the system architecture may be distributed and contained within any one or more of the various offices or data storage facilities provided in Fig. 1. Thus, there is no preconceived restriction on where any one of the individual components within Fig. 2 resides, and those of skill in the art will recognize various advantages by including the particular components provided in Fig. 2 in particular geographic and data- centric locations shown in Fig. 1.
  • input speech 205 is provided to the speech recognition system via a voice collection device, for example, a microphone 210.
  • Microphone 210 in turn is connected to the computer equipment associated with the microphone, shown as 105 in Fig. 1.
  • Computer system 105 also includes a speech recognition software system 212.
  • speech recognition software system 212 Numerous, commercial speech recognition software systems are readably available for such purpose including, but not limited to, ViaVoice offered by IBM and Dragon Naturally Speaking offered by ScanSoft.
  • the speech recognition software includes, generally, a speech recognition module 217 which is responsible for parsing the input speech 205 as digitized by the microphone 210 according to various, well-known speech recognition algorithms and heuristics.
  • Language model 219 is also typically included with speech recognition software 212.
  • the language model 219 is responsible for parsing the input speech according to various algorithms and producing fundamental language components. These language components are typically created in relation to a particular language and/or application of interest, which the speech recognition system then evaluates against a textual vocabulary database 220 to determine a match.
  • incoming analog speech is digitized and the amplitude of different frequency bands are stored as dimensions of a vector. This is performed for each of between 6,000 and 16,000 frames per second and the resulting temporal sequence of vectors is converted, by any of various means, to a series of temporally overlapping "tokens" as defined in U. S. Pat.
  • General text database 220 is typically included as part of speech recognition software 212 and includes language text that is output by the speech recognition software once a match with the input speech is made.
  • General or base vocabulary database 220 may contain the textual vocabulary for an entire language, e.g. English. More typical, however, the base vocabulary database contains a sizable subset of a particular language or desired application, e.g. hundreds of thousands of words.
  • the text output from base vocabulary database 220 is then provided as input to any one of a number of other computer- based applications 230 into which the user desires the text.
  • Examples of typical computer applications that are particularly suited for use with speech recognition software include, but are not limited to word processors, spreadsheets, command systems and/or transcription systems that can take advantage of a user's vocal input.
  • word processors word processors
  • spreadsheets command systems
  • transcription systems that can take advantage of a user's vocal input.
  • vocal input may be used to provide inputs to text field within a particular form, field or web page displayed by an internet browser.
  • the initial applications of the present invention are directed to voice-to-text applications in which vocal input is provided and textual output is desired
  • other applications are envisioned in which any user or machine provides an input to a recognition system, and that recognition system provides some type of output from a library of possible outputs.
  • Examples of such applications include, but are not limited to a search and match of graphical outputs based on a user's voice input or an action- based output (e.g. a computer logon) based on a vocal input.
  • an action-based output may be to provide access to one of several computer systems , the list of computer systems being stored in a database of all accessible computer systems based on a user's bio-input (e.g. fingerprint) or a machines' mechanical input (e.g. a login message from a computer).
  • the speech recognition/voice transcription system of the present invention further includes a specified database of text string values that provide a first-pass output in response to a particular speech input against which the system attempts to determine a match.
  • These text strings may be stored in any one of a number of formats and may be organized in any one of a number of manners depending on the practical application of the system.
  • the text strings within specified database 250 are provided from the vocal inputs of previous users of the speech recognition system.
  • the first-pass text strings may be organized by users (e.g.
  • Specified database 250 may also be organized according to numerous other criteria that may be advantageous to users of the speech recognition system of the present invention.
  • the sub-databases of first-pass text strings within first-pass, specified database 250 may be organized by fields within a computerized or web-based electronic form.
  • text input may need to be input into a medicai form 310, that includes a patient's name, shown in computerized form field 315, the patient's address, shown in computerized form field 318, the patient's phone number, shown in computerized form field 320, and the patient's age, shown in computerized form field 320.
  • Sub-databases 371 , 372 and 373 shown in Fig. 3 are specific examples of the general field sub- databases 271, 272 and 273 of Fig. 2. These sub-databases provide first- pass text strings for matching speech input provided by the doctor when populating form fields 315, 318 and 328 (Fig. 3) respectively.
  • a context associated with some aspect of the present speech input may be used to organize and condition the data into appropriate first-pass sub-databases.
  • the sub-database 381 associated with the findings field 330 within the medical form of Fig. 3 may be conditioned upon both the history and the age of the patient under the presumption that previous findings related to a particular combination of history and age group, either within an individual medical office or in general, are more likely to be repeated in future speech inputs with respect to patients having the same combination of age range and history.
  • the findings fields populated within a form in the office practice of a primary care physician, with a history of abdominal pain and characteristic physical findings may be quite similar for the following two conditions: "appendicitis” as a probable "Interpretation” field for patients age 5-12; and “diverticulitis” as a probable "Interpretation” for patients age 75+.
  • Characteristic findings (abdominal pain with what is called “rebound tenderness") will be stored in sub- database 381 and provided to "findings" field 330, while “appendicitis” and “diverticulitis” will be stored in sub-database 382 and provided to "Interpretation” field 350.
  • Specified database 250 may be created and organized in any number of ways and from any one of a number of sources of information so as to provide an accurate first-pass database for appropriate and efficient use within a particular context. If, for example, specified database 250 contains text strings organized by users of the system (a user context) under the statistical presumption that each specific doctor is more likely to repeat his or her own relatively recent utterances than earlier utterances, in situations when all other system parameters are the same, and more likely to repeat terms used by other system users or other physicians in the same specialty under otherwise identical circumstances, than to use terms neither they nor others have used in that situation, text from their own past dictations or those of others (whether manually or electronically transcribed) may be used to populate and arrange the text string values within the database.
  • a high probability first-pass database is used to provide text strings to be input into particular fields within a computerized form
  • these data values may be derived and input from previously filled-out forms. These data may then be organized into sub- databases according to form fields, for example as shown in Fig. 3 by sub- databases 371-381.
  • the specified database 250 may contain one, many or all such data for use within a particular desired context and output application.
  • the actual data values within the database may be dynamically updated and rearranged into different sub-databases during the actual use of the speech recognition system so as to accommodate any particularly desirable speech recognition situation. In the most useful instances, the data values that populate the specified database 250 will be obtained from historical data and text strings that accompany a particular use and application of the speech recognition system.
  • Supplemental data may also accompany the data values and text strings stored within specified database 250.
  • weightings and prioritization information may be included as part of the textual data records that are to be matched to the input speech. These weightings may help determine which data values are selected, when several possible data values are matched as possible outputs in response to a particular speech input. Further, these weighting and prioritization information may be dynamically updated during the course of the operation of the speech recognition system to reflect prior speech input.
  • the speech recognition/voice transcription system of the present invention further includes a context identification module 240.
  • the context identification module is coupled to one or more input and recognition components (Fig. 2, 205-230) of the overall speech recognition system 200 and is used to select or create a proper sub- database within the entire specified database 250. If, for example, the desired sub-databases to be used are based on a user context, then the context identification module may take input from a user identification device (not shown) or may determine the user from speech characteristics determined by the speech recognition software so as to select an appropriate user sub-database (e.g. 261) from the entire specified database 250.
  • a user identification device not shown
  • an appropriate user sub-database e.g. 261
  • the data values within the specified database 250 may loosely organized and the context identification module may actually condition the data values so as to dynamically create an appropriate user sub-database from the information stored within the specified database.
  • the context identification module may monitor and interpret a particular form field that is active within an application 230 into which text input is to be provided. After making such a determination, the context identification module may select, or as mentioned above, actually condition the data values so as to dynamically create, an appropriate user sub-database from the information stored within the specified database.
  • the speech recognition/voice transcription system of the present invention may further include a prioritization module 245.
  • the prioritization module may be coupled to any one or more input and recognition components (Fig. 2, 205-230) within the overall speech recognition system 200 including the specified database 250.
  • the prioritization module assists in collecting actual use information from the speech recognition system and using that data to dynamically prioritize the data values within any or all of the sub-databases contained within specified database 250.
  • specified database 250 contains text strings as selectable data values for input into medical forms in a word processing application 230.
  • the text strings may be organized according to a number of different criteria based on the users of the forms and/or the fields within the electronic forms.
  • a computer-based electronic medical form 310 shows several fields within a medical report.
  • computerized electronic form 310 may include a name field 315, an address field 318, a phone number field 320, as well as more general fields such as a findings field 330 and an interpretations field 350.
  • One possible organization of the text string data values within specified database 250 is to associate each text string with each field within a particular electronic form. As shown in Fig.
  • text string sub-database 371 may be associated with name field 315
  • text string sub-database 372 may be associated with address field 318
  • text string sub-database 381 may be associated with findings field 330.
  • two separate organizations of the text strings exist within specified, text string sub-databases 371 through 382.
  • the name field 315 for example, sub-database 371 may contain text strings that only indicate patient's names.
  • text string sub-database 372 associated with address field 318 of electronic computer form 310 may contain only text strings associated with street addresses.
  • the data organizations referenced by 261- 283 in Fig. 2 and 371-382 in Fig. 3 are logical organizations only.
  • the data records within specified database 250 may be organized, arranged and interrelated in any one of a number of ways, two of which are shown in Fig. 4.
  • the organization of the records within specified database 450 may be loose, i.e. all records may be within one file 455 where each record (and output text string) contains a plethora of relational information. (Option A.)
  • the relational information within the singular file would then, presumably, be able to be used to create the logical divisions shown in Figs. 2 and 3.
  • sub-database might be a field context sub-database 471 , for example, where the relational data pertaining to the form field within file 455 is used to organize the sub- database.
  • organization of the records within specified database 250 may be tight, i.e. records (and output text strings) may be highly organized according to context/field/user such that a one-to-one relationship exists between a particular file of records (sub-database) and a form field or user, as shown in option B of Fig. 4.
  • the first type may be classified as a singular context sub- database in that one specific criterion provides the motivation for grouping and organizing the records to create the sub-database.
  • One specific embodiment of the specified, this type of sub-database, 371 of Fig. 3, is shown in more detail in Fig. 5, where text string records containing street addresses are stored within sub-database 571 in tabular format.
  • individual records 510, 511 and 512 contain text strings of previously dictated (specified) street addresses which are provided for the purpose of matching a user's speech input when the address field 318 (Fig.3 ) is the active dictation field.
  • data such as weighting information 552 and user's data 554, may also be included within text string sub-database 371.
  • the data records within the sub-database 571 contain text strings and accompanying relational data intended for use only within a specific field within a computerized form or web page.
  • Other specified sub- databases similar to 571 may contain text strings and accompanying relational data that is intended for use with only one of the users of the speech recognition system.
  • a second sub-database type multiple context organizations of the data within specified database 250 are also created.
  • medical form 310 of Fig. 3 may contain input fields that are related to other input fields within the overall electronic form. This interrelationship typically occurs when the voice dictation provided as an input to a field within an electronic form is of a more general nature.
  • the organization of the text strings within a sub-database may not be based on a single, external, context, such as a specific user of the system or a particular field within an electronic form, but rather may be based on the interrelation of the actual text strings in a more complex manner.
  • context specific sub-databases 381 (pertaining to the medical findings field) and 382 (pertaining to the medical interpretations field) may include contextually intertwined text strings that the speech recognition system of the present invention must identify and properly select so as to achieve the efficiencies of the present invention.
  • These more complex, contextually intertwined text string sub-databases are shown as logical sub-databases 281-283 in Fig. 2.
  • sub-database 381 provides text strings that may be input into findings field 330 and sub-database 382 provides text strings that may be input into interpretations field 350.
  • name field 315 for example, sub-database 381 is designed to match text strings to a more ge.neral and varied voice input provided to the speech recognition system.
  • Fig. 6 shows one specific embodiment of the specified, text string sub-database 382 of Fig. 3.
  • Sub- database 382 provides text string records related to medical interpretations which are stored within sub-database 682 in tabular format.
  • individual records 615, 616 and 617 contain text strings from previously dictated (specified) interpretations which are provided for the purpose of matching a user's speech input when the interpretations field 350 (Fig.3 ) is the active dictation field.
  • Other relational data such as weighting information 652 and interrelational context information (e.g. age 654, user 656, findings 658) may also be included within text string sub-database 682.
  • interpretations text strings such as pneumonia and dysphagia, are provided as potential text strings to be evaluated against a user's dictation to provide a text input to the interpretations field. [52] Also shown in Fig.
  • the interpretations sub- database 682 includes both textual inputs as records 616 and 617 respectively.
  • Exemplary interrelational data are also included as data within the text records record of the sub-database. Such data include a patient's history 654, a user of the system 656, the specific findings regarding the patient 658, as well as a general, historical weighting based on the number of times the two term have been used 652,.
  • table 682 is loaded and consulted to achieve the best possible textual input for dictated speech. If, for example, the phonetically similar word dysphagia/dysphasia is dictated into the system of the present invention then the context interpretation module would evaluate that voice input in view of any one or combination of contextual data. In one case, if the patient's past medical history included digestive complaints then the more probable textual match, dysphagia, may be selected. Similarly, if the patient's past medical history included neurological complaints, the term dysphasia may be selected. Similarly, the context identification module may rely upon other relational data associated with the two text strings to determine the highest probability input. If Dr.
  • Brown is a pediatrician and Dr. Smith is a geriatric physician
  • appropriate weight may also be given by the selection system to these previous inputs in determining the proper text input for the interpretations field.
  • the input to the findings field 330 may be considered, in which a "difficulty swallowing" would result in a more likely match with dysphagia and "speech impairment” would result in a more likely indication of dysphasia.
  • other simple weighting factors such as the number of times each term has been used previously may also be used by the system of the present invention to select a more probable input text string.
  • system of the present invention may use one, many, or all of the aforementioned contextual relationships to determine and select the proper text input, possibly after assigning additional weighting function to the interrelational data itself, i.e. weighting a user's context higher than the age context.
  • a user of the speech recognition system of the present invention inputs speech 205 to microphone 210 for processing by speech recognition system 212.
  • speech recognition system package 212 typically provides a single, general or base vocabulary database 220 that acts as a first and only database. Because of the size of the database and the general nature of the language and the text strings contained within it, voice-to-text transcription accuracies may vary when the speech recognition system is used only with such large, non-specific vocabularies. In medical contexts, for example, inaccuracies in transcription of dictation may result in undesirable or even disastrous consequences. Thus, the inaccuracies generally tolerated by system users must be improved.
  • the specified database 250 is used by the speech recognition system of the present invention as a first-pass database in selecting an appropriate textual match to the input speech 205.
  • the context identification module 240 is responsible for selecting and loading (or creating) a particular sub- database from specified database 250 during a user's dictation so as to provide a high probability of a "hit" within that sub-database.
  • the selection process employed by context identification module is based on a context of the input speech or a context within the dictation environment.
  • Possible contexts include, but are not limited to, a particular user of the speech recognition system, a particular field within an electronic form being processed by the speech recognition system, or the interrelation of previously input text with a sub-database of text that is likely to be dictated based thereon.
  • Specified database 250 may be created in any of a number of manners, in one particularly preferred embodiment, past forms may be scanned and digitally input into a computer system such that all the text strings used within those computer forms are digitized, parsed and then stored within the database. The text strings may then be subdivided into specific databases that are applicable to specific speech recognition circumstances. For example, with respect to the example of addresses sub-database shown in Fig.
  • a series of previously recorded paper or electronic medical forms may be parsed, separated and stored such that all the street addresses used on those forms are stored in a separate portion 271 of database 250.
  • findings within field 330 and interpretations within field 330 of the electronic form in Fig. 3 may be subdivided from general text string database 250 to create a specific contextual database of diagnoses for use with a particular medical form.
  • specified database 250 may be organized in any one of a number of different ways to suit the particular needs of a particular speech recognition application, such as textual input into an electronic form. Such organization may take place statically, i.e. before the user employs the voice transcription system, or dynamically, i.e. during the use of the voice transcription system.
  • a general process flow is provided for the operation of speech recognition system 200.
  • the process starts with step 705 in which the speech recognition system is loaded and has begun to operate.
  • Specified vocabulary databases may be defined and loaded here for a particular, more global use during the remainder of this process.
  • a user of the system is identified at step 707.
  • the user may be a particular doctor who wishes to provide speech input to a medical form as part of his practice within a practice group or a medical office. As described above, this user ID may later be used to select appropriate sub-databases and associated text strings from specified database 250.
  • User identification may be done through speech recognition, keyboard entry, fingerprinting or by any means presently known or heretofore developed.
  • voice input from the user is provided to the speech recognition system in step 710. This vocal input is digitized for use within computer system 105 which is then input into the speech recognition system employed on that computer system as shown in step 720.
  • the context identification module selects or creates an appropriate sub-database consisting of a subset of the text strings within database 250 as the system's operative first-pass database at step 730.
  • the selection of an appropriate sub-database may occur according to any one or more of a number of different criteria.
  • the criterion on which the sub-database is selected is based upon the user of the voice transcription system as provided in step 707. Specifically, any particular user may have a historical use of certain words and phrases which may serve as a higher probability first-pass source of text string data for future use by that particular user. Thus, the appropriate selection of that database will result in higher transcription accuracy and use within the speech recognition system.
  • the sub-database is selected from the specified database 250 at step 730 according to the field within the electronic form into which text is being input. For example, referring to Fig. 3, when a user wishes to populate address field 318 with a particular address, the user would indicate to the system at step 730 (e.g. through a computer graphical user interface or a vocal command input) that the address field is to be populated.
  • the speech recognition software of the present invention selects or creates an appropriate sub-database from specified database 250 that contains at least the addresses for use within that form field.
  • the actual data selected and pulled by the context identification module would typically include related contextual information that would provide insight into the historical use of particular addresses so as to provide a higher probability in transcription accuracy.
  • the speech input provided by the user to the speech recognition system at step 720 is evaluated by that system with respect to the text strings within the sub-database selected in step 740. This evaluation may be performed according to the same algorithms and processes used within the speech recognition system 212 which are used to select matching text from its own base vocabulary in database 220.
  • Various methods and mechanisms by which the input speech is parsed and converted to a language output and/or text string output are well- known in the art, and these text matching mechanisms and evaluation criteria are independent of the other aspects of the present invention.
  • evaluation criteria may be used on the overall database 250 or the sub-database selected in step 730.
  • Such evaluation methods are well-known, although particular evaluation criteria that are applicable to speech recognition principles may also be employed when populating a field within an electronic form.
  • the specific text strings of a particular sub-database such as that shown in Fig. 5 may include a weighting function as shown in field 552 of sub-database 571.
  • the weighting field may include the number of times a particular address has been input into a form within a specific historical period. Even with this over-simplified weighting scheme, ambiguities as between two very similar addresses may be easily resolved in determining a proper textual match corresponding to a speech input.
  • weighting schemes using both objective indicia (e.g. data use count) and subjective indicia (e.g. weights related to the data itself and its interrelation with other data) are well known in the art and may also be included within specific database 571 for use in the context identification module. Further, other evaluation criteria may be used to select an input text string from the sub- database. For example, a most-recently-used algorithm may be used to select data that may be more pertinent with respect to a particular transcription. Other weighting and evaluation criteria are well-known and those of skill in the art will appreciate different ways to organize and prioritize the data so as to achieve optimal transcription accuracy.
  • a prioritization module 245 may be included as part of the speech recognition system 200 of the present invention to implement and manage the above-mentioned weighting and prioritization functions.
  • the speech recognition system within the present invention would default to base vocabulary database 220 at step 770, at which point, the speech recognition software would transcribe the user's voice input in its usual fashion to select a text string output (step 750) according to its own best recognition principles and output the same to the electronic form (step 760).
  • Fig. 7 may be repetitively performed in a number of different ways. For example, as one particular electronic form is filled out, sequential fields within that form need to be designated and then populated with an appropriate text string. As such, following the insertion of a particular text string within a particular form field, the process of Fig. 7 may return to step 720 where the user inputs additional speech input after selecting the new field into which the vocal input is to be transcribed. During this second iteration, a second, appropriate sub-database of text strings from specified database 250 would be selected as an appropriate first-pass database for the second field. The process of evaluating and matching the user's vocal input with text strings within the second sub-database, i.e., steps 740 through 770, would operate as mentioned above.
  • a second user may employ the speech recognition system of the present invention in response to which different sub-databases of text strings would to be loaded that pertain to the specific use of that second user at step 730.
  • a second user would be identified at step 707, after which the speech input provided by that second user would be digitized and processed by the speech recognition system at step 720.
  • the selection and/or creation step 730 may or may not be performed (again) and may be omitted if the only sub-database selection step is conditioned upon a user.
  • the remainder of the process provided in Fig. 7 may then be performed to select an appropriate text string as input into the fields of the electronic form for that second user.
  • Example #1 A new radiologist joins a group of radiologists who have been using voice recognition technology to dictate reports for about two years. Their practice has a four year old database of digitally recorded imaging studies, linked to a database of the past two years of computer- transcribed reports as well as several years of prior reports manually transcribed to computer by transcriptionists listening to voice recordings. The new radiologist has "trained" the voice engine to recognize his voice as a new user by engaging in a set of radiology voice training exercises that are customized to include phrases commonly used by other members of his group.
  • the new radiologist's first assignment using the system of the present invention is to dictate a report on a sinus CT scan
  • the radiologist would identify this report as being for a sinus CT scan and click on the "findings" field at which time the program will load a specified vocabulary for first pass pre-screening composed of text strings that other members of the group have previously used in their dictations as input to the "findings" field for sinus CT scans.
  • the prioritization algorithm administered by the prioritization module for his specific user sub-database files may assign relatively higher prioritization scores to his own dictated text strings vis-a-via the dictated text of his colleagues. Over time it will adapt to his personal style, further improving transcription accuracy.
  • the new radiologist is assigned to read studies of the digestive system, and his first two cases are barium swallow studies of the upper gastrointestinal tract. The first case is for the evaluation of a two-month old infant suffering from vomiting, and the second case is a follow-up study for an 87 year-old man with esophageal strictures.
  • the transcription accuracy of the new radiologist's reports may be maximized by applying more complex prioritization and selection algorithms to the selection of previously-used phrases to be loaded for first pass pre-screening.
  • the weighting of previously used text strings and the selection of those data items as first-pass text strings values for these reports could result in the assignment of multipliers to those data items. These weights could be updated not only each time the first-pass text strings were previously used but also based on the type of study, the age of patient and the diagnoses or symptoms listed as reasons for physician's request in ordered the study.
  • weighting factors for text string prioritization and selection could, for example, be based on prior frequency of use in reports of all barium swallow studies in children aged less than 6 months or less than one year.
  • prioritization could, for example, be based on the frequency of use of those text strings in reporting barium swallow studies in patients in any one or more of the following classes: patients more than age 60/70/80; use of those text strings in reporting barium swallow studies in males in these age ranges; prior use of those text strings in reporting barium swallow studies in patients with a prior diagnosis of esophageal stricture; prior use of those text strings in reporting barium swallow studies of patients with a prior diagnosis of esophageal stricture by age and/or sex; and/or the presence or absence of other symptoms (such as swallowing pain or vomiting).
  • weighting factors related to the presence or absence of a symptom including associated diagnoses (such as status post radiation therapy for a specific type of lung cancer) may be listed in the ordering physician's request for the procedure or may already be present in the database of prior diagnoses for that patient.
  • Example #2 A physician dictates into either a computerized medical record database or a structured consultation report form as he examines a patient in an office setting.
  • the medical report will usually begin with a listing of the probiem(s) for which patient is being seen.
  • server As effective weighting factors so as to allow the prioritization of previously-used text strings and load the most probable first-pass text strings for each report.
  • Previous diagnoses if noted in an initial consultation or if already present in the database from previous diagnosis of the same patient, may also be useful as text string weighting factors for sub-database prioritization and selection.
  • a computerized medical record has functionally separate date fields.
  • other types of medical reports have structured sections.
  • Speech recognition transcription accuracy for each such application can be enhanced through the prioritization and selection of first pass, text string databases for each such field on the basis of numerous factors including, but not limited to: the age and sex of the patient; problems listed as reason for that patient's visit or to be determined during that patient's visit; previously recorded diagnoses for that patient; previous use of text strings to be prioritized by that physician in reports for that patient; previous use of those text strings with that combination of other selection factors by that physician for other patients; and/or previous use with that combination of other factors by other members of that specialty.
  • Example #1 As in Example #1 , as each office that uses the present invention accumulates data, it becomes possible to retrospectively analyze prioritization algorithm performance and compare the first-pass hit efficiency of different weighting assignments for different factors in the prioritization algorithm. This allows the initial data record selection scheme to be optimized and permits for a quantitative analysis of the relative efficiency of various prioritization models and weightings for the various offices.
  • SAPI and, in its day, SRAPI provide for computer-based responses to three types of speech input: application defined commands, user-defined commands (both referred to hereinafter as "commands") and general dictation of vocabulary.
  • a signal representing an incoming item of speech is first screened by the program to see if it represents a command, such as, "New paragraph,” and, if so, executes it as such.
  • this command may cause the insertion of a paragraph break, a new-line feed and an indent so as to permit the continued dictation in a new paragraph.
  • Incoming speech items that are not recognized as commands are transcribed as general vocabulary text, in which the speech recognition software looks for the best possible match for the dictated text within combinations of single word text strings loaded into the general vocabulary database of the application.
  • Current versions of the SAPI protocol and current voice engines only accommodate the loading of one vocabulary at a time. However, they accept rapid loading and unloading of smaller sets of user-defined commands. These smaller sets may be as large as the relatively small, first-pass vocabularies needed to optimize speech recognition accuracy for dictation into a computer field.
  • the invention of the present invention encompasses methods to identify, prioritize and select the high probability text strings which would optimize transcription accuracy if used as a first pass pre-screening vocabulary. These text strings may then be translated into user-defined commands which are loaded and screened for matches as a first pass "virtual vocabulary.” In this manner, the existing speech recognition systems have been tricked into implementing a two-pass vocabulary screening model as described above under present SAPI protocols and with presently available voice engines. Incorporation of the methods and apparatus of the present invention would be made more user-friendly by incorporating the entirety of this invention into future versions of SAPI and into applications compliant with such future versions of SAPI.
  • a general process flow for the operation of the speech recognition system 200 is provided as it would be implemented within a specific SAPI speech recognition engine.
  • the steps are substantially similar to those provided in Fig. 7 with the following modifications.
  • the process of Fig. 8 sequentially evaluates the speech input first, against the database of system commands 840, and then, if necessary, against the database of user-defined commands 841, and then, if necessary, against the database of a first vocabulary 842, and then, if necessary, against the database of a second vocabulary 842, and finally, if necessary, against a final database 844.
  • step 854-855 the "command" is executed (steps 854-855) or a learning function is exercised (steps 856-858), and the executed command or selected text from a database results in the generation and insertion of the selected text string into a computer form field (step 860).
  • step 854-855 the "command" is executed (steps 854-855) or a learning function is exercised (steps 856-858), and the executed command or selected text from a database results in the generation and insertion of the selected text string into a computer form field (step 860).
  • first pass vocabulary 842 may be provided which includes previous dictations by the same user when all the other variables were identical.
  • the second pass vocabulary 843 may be provided which includes dictations by other members of the radiology group when all other variables were the same as those of the present report.
  • the third pass vocabulary 844 may be provided which includes other dictations by the present radiologist into the same field for the same type of study but for patients with all combinations of age, sex, past medical history and reason for study.
  • known transcription methods include vector learning in which the speech engine alters the way it maps incoming utterances into the vector space. If, for example, a native Bostonian speaker of English used the speech recognition system, a "translation" of sorts is needed for the speech engine to process the user's incoming speech so that the appropriate vocabulary is matched with the speaker's utterances. This is typically handled by speech engines through a training process in which the user reads for several minutes from text for which the vector mapping sequence is already known to the system, and the program develops an array of user- specific vector corrections to optimize the match between the vector sequence of the user's pattern of utterances and the vector sequence of the text he's reading.
  • the speech engine may adjust for a speaker's accent and/or other speaker-specific or speaker-associated variations from “typical” patterns of pronunciation.
  • vector learning will affect the overall recognition accuracy of a speech recognition application, it is perfectly compatible with the teachings of the present invention which may be used in conjunction to select the users intended word from a group of vocabulary returned by the speech engine.
  • known transcription methods include scalar learning which involves the weighting of different vocabulary items based on prior use.
  • Scalar learning makes one vocabulary element in the vector space brighter or dimmer than an average based on the frequency of past use, say for example, by a particular user of the system.
  • the probability of matching a user's input in vector space becomes a function of both the speech engine's provided probability of matching a user's input based on the actual input received and the frequency of use of other, less frequently used vocabulary elements.
  • developers of a speech engine may begin with large vocabularies that may have default weightings of certain vocabulary elements based on frequency of use in general speech which are then further modified by the frequency of use by each (or a particular) individual user.
  • the goal of speech recognition software is to minimize the frequency of transcription errors, errors will occur, particularly as the system "learns" the dictation habits of new users, new form fields and vocabulary contexts any of which include new vocabulary elements.
  • dictation errors in which the user utters the wrong word
  • transcription errors in which the speech recognition system misidentifies what was said, are not likely to be recognized and corrected until some time after the dictation is completed, often on the order of hours and typically up to days, after the dictation has been completed.
  • scalar learning is based on the frequency with which various text elements have been used in the past (i.e., accepted as accurate transcriptions), accurate scalar learning requires that feedback be included.
  • the specific vocabularies of the present invention consist of searchable text entered in a date and time-stamped manner into forms and fields identifiable by user and context.
  • Off-line error correction is performed by amending text directly in the fields into which it was dictated in the database so that new scans and word use counts of those fields will incorporate these corrections to generate situation-specific vocabularies for future use. Further, text amendments may be made directly to the database when it is open or unlocked for correction.
  • the scalar learning will be based on text that incorporates correction of both dictation and transcription errors, as opposed to attempting to account for categorize and record the nature of the dictation errors as between dictation corrections or transcription corrections.
  • the input data from the speech engine is locked when it is signed. In this case, it may not be practical to include corrections made to records after they have been locked in the subsequent vocabulary and the vocabulary element weighting scheme may depend on how data is stored.
  • the original record remains in the database and amendments to the speech input are made after that data is locked and stored elsewhere in the system, it may be advantageous to base scalar learning (i.e., weighting) on a scan of the database, although in this case it will not include correction of either dictation or transcription errors that were found and corrected after the record was signed and locked.
  • the user will generally be known to the system. If database entries are appropriately indexed or tagged using standard methods, when a registered user loads a specific form and enters a specific field of this form, the system can quickly compile the programmed hierarchy of vocabularies to be matched with incoming speech in that field of that form, and, for applications which also sort data by previously entered indicators of context.
  • the incoming speech from a user is compared with the hierarchical sequence of vocabularies moving from the most specific to the most general.
  • the sequential vocabularies themselves, they may be ordered according to any one of a number of criteria to match a particular need.
  • the vocabularies may be created and provided in an order of frequency of use, say for user, each vocabulary containing frequently used vocabulary elements of a particular user.
  • vocabularies which are used to provide an input to a particular form field may be created and ordered in terms of the relative frequency of input of various terms historically input to that form field.
  • the vocabularies may be created and ordered in terms of the relative frequency of use of terms according to a particular context of speech use (e.g. medicine or law). Combinations and permutations of these factors may also be used to create and order vocabularies from appropriate vocabulary elements so as to achieve appropriate speech engine matching results in any particular circumstance.
  • the first vocabulary contains terms from the same user, into the same field for the same context. Subsequently screened less-specific vocabularies may contain vocabularies from the same user for the same context in any field.
  • the vocabularies of the present invention may be organized according to the following (doctor specific example):
  • the enhanced invention saves the best matches based on a match probability score returned by the speech engine from each sequentially searched vocabulary.
  • This set of best matches from the first (and generally most situation-specific) vocabulary of the hierarchy is then combined with a set of best matches from the next sequential vocabulary to be tested in the hierarchy, with the assignment of increased match likelihood weight to those from the first vocabulary to reflect the greater situation specificity of that vocabulary.
  • This process may be repeated so that all vocabularies in the hierarchy are searched and each saved element is assigned an increased selection weight each time it survives the transition from the set of best matches at the end of one cycle of the process to the set of best matches at the end of the next cycle.
  • each subset of combined set of vocabulary elements may be reduced in size according to the weighted matched probability scores (say to keep it to "n" entries) as the sequence of vocabularies are processed so that the combined set of vocabulary elements does not grow beyond a manageable size during the process.
  • incoming analog speech is received (901) digitized and framed (902), and acoustic information for vector mapping is extracted (904).
  • the acoustic information is sequentially matched with system commands (907) and user-defined commands (910) to determine an input match.
  • the user has the option to determine whether an utterance will be exclusively treated as a command (for example, in some systems by pressing the keyboard ⁇ ctrl> key during dictation), in which case the speech processing system may only matches (907) and (910), or as speech (for example, in the same systems by pressing the keyboard ⁇ shift> key during dictation), in which case the speech processing system bypasses matches (907) and (910) and begins with speech vocabulary match step (919). If an incoming utterance is not matched with a system or user-defined command (steps 907 and 910) or if the user bypasses these steps by marking the utterance as text, the enhanced invention of the present invention includes the following sequence of events.
  • the sequential vocabularies to be used may be selected and dynamically updated (913) according to any one or more of the criteria mentioned above, e.g. user, form field, or context.
  • the first vocabulary is selected and loaded (916) with each entry weighted according to an algorithm based on factors which may include but are not limited to user, prior use in that combination of form, field & context, and time elapsed since each prior use.
  • the speech engine then matches the contents of the first vocabulary with the incoming utterance not according to the language rules of the particular search engine and match probability scoring procedure used by that speech engine (919).
  • N potential matches are then identified using highest match probability scores and saved, along with their scores, in a designated array of data registers within the speech processing engine (922). Each saved match probability score is then weighted by a "vocabulary priority factor" "vpf-1.”
  • vpf-1 a "vocabulary priority factor”
  • a simple multiplication function may be used by the speech engine, particularly if the match probability scores are numbered on a linear scale. In any case, a weighting function is used to increase the match probability score for the "n” initially selected vocabulary elements from the first vocabulary (925). These "n" best match candidates with weighted match probability scores are then stored (928).
  • the second vocabulary is then loaded, possibly having each vocabulary entry weighted according to an algorithm based on the above- mentioned factors (931).
  • the speech engine matches the contents of the second vocabulary with the incoming utterance not according to the language rules of the particular search engine and match probability scoring procedure used by that speech engine (934). "N" potential matches are then identified using highest match probability scores and saved, along with their scores, in a designated array of data registers within the speech processing engine (937). These "n" potential matches are then combined with the "n” weighted matches from the first evaluation (940) so as to create a combined set of vocabulary elements.
  • Each of the saved match probability scores of the vocabulary elements within the combined set of vocabulary elements is then weighted by a second "vocabulary priority factor" "vpf-2" (946).
  • vpf-2 "vocabulary priority factor”
  • a simple multiplication function may be used by the speech engine, particularly if the match probability scores are numbered on a linear scale.
  • a weighting function is used to increase the match probability score for each element of the new combined vocabulary that will be pooled with the best selections from another, still less situation-specific vocabulary in the next cycle (946).
  • the set of vocabulary elements for matching and storage may be maintained at a particular or constant size (e.g. "n") so as not to grow the set of combined vocabulary elements to a computationally unwieldy number of elements.
  • the method of the enhanced speech recognition system continues, iteratively, until the v-th vocabulary is loaded, possibly having each vocabulary entry weighted according to an algorithm based on the above-mentioned factors (971).
  • the speech engine matches the contents of the v-th vocabulary with the incoming utterance according to the language rules of the particular search engine and match probability scoring procedure used by that speech engine (972).
  • "N" potential matches are then identified using highest match probability scores and saved, along with their scores, in a designated array of data registers within the speech processing engine (973).
  • These "n” potential matches are then combined (970 dashed line to 974) with the "n” weighted matches from the previous evaluation (974) so as to create a combined set of vocabulary elements.
  • Each of the saved match probability scores of the vocabulary elements within the combined set of vocabulary elements is then weighted by a v-th "vocabulary priority factor" "vpf-v” (976).
  • a simple multiplication function may be used by the speech engine, particularly if the match probability scores are numbered on a linear scale.
  • a weighting function is used to increase the match probability score for the "n" initially selected vocabulary elements from the v-th vocabulary (976).
  • These best match candidates with weighted match probability scores are then stored (977).
  • a final, large (e.g. all encompassing global) vocabulary is loaded, possibly having each vocabulary entry weighted according to an algorithm based on the above- mentioned factors (980).
  • the speech engine then matches the contents of the large vocabulary with the incoming utterance according to the language rules of the particular search engine and match probability scoring procedure used by that speech engine (983). "N" potential matches are then identified using highest match probability scores and saved, along with their scores, in a designated array of data registers within the speech processing engine (986). These "n” potential matches are then combined (977) with the "n” weighted matches from the previous evaluation (989) so as to create a combined set of vocabulary elements. If desirable, each of the saved match probability scores of the vocabulary elements within the combined set of vocabulary elements is then weighted by a final large "vocabulary priority factor" "vpf-l” (not shown). These best match candidates with weighted match probability scores are then stored (991).
  • the speech engine selects a best match for the user input to the speech recognition system a vocabulary element from the combined set of vocabulary elements based on said repeatedly weighted match probability scores so as to result in a final match for the user input (994).
  • the final match is then stored for further retrieval by the speech engine.
  • the other next closest sequential matches may also be stored for presentation and selection by the user as potential matches to the user's input.
  • a flagging system may be included so that the user may designate the particular input and initially selected match as an error for future dictation correction.
  • the . respective match probability scores for the three selected vocabulary elements are 1 , 3 and 5 and for the second pass vocabulary the respective match probability scores for the three selected vocabulary elements are shown as 2, 3 and 5.
  • Now by calculation of the sum of all the weighed probabilities is 263 which, when shown as a fractional representation of the numerical probability of each of the six vocabulary elements, the sum of which is one, is provided in column 4.
  • the first vocabulary prioritization factor is 1.1 ; the preliminary values for the weighted, first three vocabulary elements are shown as the first three values in column 6 of Fig. 11.
  • the problem at hand is to calculate the same respective weighted values for the for the three vocabulary elements of the second vocabulary such that the total sum of the probability of the search engine choosing one of the values is still equal to one.
  • looking at the total of the weighted first three vocabulary elements from the first vocabulary this is subtracted from the original total of 263 to give a distributed sum of 125.5 to be allocated in appropriate proportion over the three vocabulary elements of vocabulary 2.
  • the three approximate target values of 21.82, 35.47 and 68.20 are shown to result in column 6 for those elements.
  • the quadratic equation is used to "undo" the new target values so as to calculate and derive the modified first and the modified second match probability scores for the vocabulary elements of the first and second vocabularies.
  • the positive roots of the quadratic equation given the values of a, b and c provided in columns 7, 8 and 9 respectively, are shown in column 12 as the final, exponentially weighted match probability scores for the six vocabulary elements in the table of Fig. 11. Further, the non-linear, non-constant effective multiplicative value for each of the individual vocabulary elements is shown in column 14.
  • the non-linear weighting has required that the vocabulary elements of the two respective vocabulary match sets be combined prior to the derivation of the effective multiplicative factor that would provide the same weighted value as shown, by way of example, in steps 1038 and 1040 of Fig. 10 for the second vocabulary match.
  • a virtual vocabulary that is specific to a particular context or records database may be created on the outside of the search engine.
  • This virtual vocabulary may be tightly coupled to the particular context of the records database such that the virtual vocabulary acts as a surrogate for providing minimally intrusive modifications to the search results retuned by the speech engine. Such modifications would be provided in an effort to increase the efficiency of the speech engine's returned search results as specifically tailored to accommodate a particular context or a particular records database.
  • a records database is used as the target for the speech engine's search results and the interface to the records database is a form field input screen in which a number of fields are provided for text input.
  • the user of the records database may provide such input by either typing in the desired textual data or by dictating such data into speech recognition system that attempts to transcribe the dictated input into the most appropriate text for input into the form field.
  • a medical records database and associated input system is an exemplary use of such a database record system.
  • the vocabulary elements of the records database may contain specific medical terminology that may not be a part of the speech engine's default vocabulary.
  • adjunct criteria may include criteria that the speech recognition systems do not use. as part of its default selection criteria.
  • criteria may include identification of the electronic form field in which the vocabulary element has previously been used, the subject or patient with reference to which the dictation is being performed, and/or the doctor providing the dictation input.
  • the virtual data record includes a vocabulary element 1114 which is compatible with and specific to a particular speech engine.
  • the vocabulary element used by speech engines typically consist of a speech phoneme or string of phonemes based on a particular file format understandable by and used by the speech engine.
  • One well known example of a collection of such vocabulary elements to create a grammar file is the Bakus-Naur Form (BNF) which describes the structure of the language spoken by the user.
  • BNF Bakus-Naur Form
  • Such grammar file representations are well known in the art and are not discussed in detail herein.
  • other grammar file formats and structures may be used and the present invention may be applied to and used with any such vocabulary element.
  • the vocabulary element of Figure 11 is discussed below as if it consists of a text word upon which an exact linguistic textual match is being performed, although it should be recognized that this is most likely not the actual electronic representation and matching exercise that typical commercial speech engines employ to perform such matching.
  • the preferred vocabulary element embodiment is a computer text element that is derived form actual input from a records database.
  • a plurality of historical use tags, 1116, 1118, 1120, 1122, and 1138 may be provided according to different criteria associated with vocabulary element 1114.
  • Use tags may consist of computerized bits and bytes that identify a previous form filed in which the vocabulary element has been used 1116, a previous user of the speech engine 1118, a previous context in which the vocabulary element has been used 1120, or a previous patient with which the vocabulary element has been used.
  • each tag preferably includes information representing a weighting for each of the numerous elements of the specific criteria associated with the particular tag.
  • use tag 1116 is shown expanded in its database representation to include data pertaining to each electronic form in which the vocabulary element has been used, e.g. Field ID 1, 1150, Field ID 2, 1160, to field ID N, 1180.
  • Field ID 1, 1150, Field ID 2, 1160, to field ID N, 1180 is associated with Filed ID 1 .
  • Associated with Filed ID 1 is the number of times the associated vocabulary element has been used in that field 1152 and optionally other relevant weighting criteria used by the adjunct vocabulary database and/or speech engine 1154.
  • the representation of the tag data within the adjunct vocabulary database may be accomplished by any of numerous different representations, and further, that the tag data itself may grow to be prohibitively voluminous in view of the size of the data needed to represent the vocabulary element itself.
  • different data representations and organizations of the data within the adjunct vocabulary database may be used to implement the teachings of the present invention so as to optimize data storage sizes, data searching efficiency and any other database optimization criteria.
  • Speech Application Language Tags as provided by the Speech Application Language Tags (SALT) Forum, which has published a SALT 1.0 specification the entire contents of which are incorporated herein by reference, is another exemplary method of implementing the use tagging and which may be used according to the method of the present invention.
  • Virtual vocabulary software and data system 1210 includes a working virtual vocabulary database 1260, backup databases for the virtual vocabulary database1262, and at least one operational building vocabulary database 1264.
  • Physical data access to and between the databases is provided by software bus 1220.
  • Virtual database management software 1226 is included within system 1210 and comprises numerous software modules for managing, accessing, sorting and searching the virtual vocabulary elements within the databases.
  • database access module 1240 is provided to control all data access functions.
  • Coherency module 1242 is provided as part of the database access module 1240 for coordination and maintenance of the data coherency of the various databases.
  • a prioritization module 1250 is provided for use in prioritizing the virtual vocabulary elements within the databases.
  • Scanning module 1234 is provided for repeatedly scanning and altering the data within the databases.
  • Speech engine interface module 1270 is provided to process the incoming speech matching requests provided through interface module 1272 from the speech engine and returning the adjunct vocabulary elements in response to the same.
  • the speech engine provides the virtual database system an initial dictation results array 1320 consisting of the list of potentially matching results (1322), VE1, VE2 ...VEN and their corresponding initial weightings (1324) W1i, W2i ...WNi.
  • the virtual vocabulary software returns a modified dictation results array 1340 consisting of the same array of vocabulary elements (1342) VE1 , VE2, ...VEN with modified weightings as provided by the virtual vocabulary system according to one preferred embodiment, (1344) W1 m, W2m ... WNm.
  • a modified dictation results array 1340 consisting of the same array of vocabulary elements (1342) VE1 , VE2, ...VEN with modified weightings as provided by the virtual vocabulary system according to one preferred embodiment, (1344) W1 m, W2m ... WNm.
  • the initial dictation results array and modified dictation results array are discussed as if they are different and distinct arrays. Those of skill in the art of database management will realized that the same data structure at a particular memory location may be used to store the vocabulary elements and initial weightings and the system, according to the present invention, simply replaces the initial weightings with modified weightings to create the modified dictation results array.
  • the software associated with the virtual vocabulary databases allows for very rapid operation of the re-weighting processes performed on the input dictation results array.
  • re-weighting requests are issued from the operational software used with the record database system in the form of a function call to the virtual vocabulary database software.
  • Interface module 1272 distributes the request to prioritization module 1250 and scanning module 1234 which, in turn process the request as described in detail with respect to Figure 14 below.
  • Database access software 1240 with the assistance of coherency module 1242 manages access to the various virtual vocabulary databases 1260, 1262 and 1264 in satisfaction of the request processing.
  • the input dictation results array is provided to the interface module 1272 though software interface 1270.
  • Prioritization module 1250 performs the necessary weighting calculations, including any required normalizations, and returns the modified weightings to the interface module for population/replacement of initial weightings in the initial dictation results array to create the modified dictation results array.
  • the modified dictation results array is then "returned" to the speech engine through to the interface module. All prioritization decisions and processing for this process are preferably performed by prioritization module 1250 for ease of searching and efficiency improvement purposes.
  • the scanning module is provided to continuously scan the numerous databases and input and update new virtual vocabulary elements, including updating use and weighting tags.
  • Incoming analog speech is digitized and framed (1401) and acoustic information for vector mapping is extracted (1404).
  • some incoming speech may be provided to the speech engine solely for use and matching as user- selected or programmed language which is to be matched only with commands or user defined speech.
  • the speech engine attempts to match the acoustic information against system commands (1410), and then presuming no match, against user-defined commands (1413).
  • the search engine continues to pass on the matching of systems and user-defined commands, the speech engine then attempts to match the acoustic input with default vocabulary terms, for example as a speech-to-text input to another piece of software with which the speech engine is working.
  • the overall processing of matching extracted acoustic information typically involves a two-step filtration process: a first step, coarse matching process that uses relatively minimal speech engine computational resources and which eliminates obvious non-matching vocabulary elements, and a second step, refined matching process that makes more extensive use of the speech engine's computational resources to achieve a good prioritization of the potential close matches returned by the first, coarse matching process.
  • the coarse match is optimally positioned early in the overall process, for example to follow closely after the extraction of acoustic information (1407).
  • the course filter Positioning of the course filter immediately after the extraction of the acoustic information allows the course filter to use a minimal resource-intensive pass-fail criteria which my not only be used to make the above-mentioned match for system and user-defined commands but may also be used for paring down the initial, entire speech engine vocabulary (1418) so as to streamline the subsequent fine match process.
  • the output of the course match process regardless of the location and number of instantiations, is a sub-array of potential matching vocabulary elements from the entire (default) speech engine vocabulary, where each vocabulary element has a default or initial weighing associated with those vocabulary elements 1420 [120]
  • the course filter may operate apart from the core speech recognition software of the speech recognition system such that it is accessed through a function call.
  • This course filter could then also, conceivably be called by virtual speech recognition software to provide course screening of virtual vocabularies.
  • the course filter may also be called conceivably by the speech recognition software so as to pare down any other vocabularies used by the system such as the hierarchically organized vocabularies, that include prioritized and weighted sets of vocabulary elements, as shown in steps 919/1019, 934934/1034, and 972/1072 and described in the accompanying text.
  • the initial dictation results array is provided with default or initial weights WXi (fig 13) - or w-D (Fig. 14). From this initial array in response to each speech input to the system, a modified dictation results array having a modified dictation result weighting WXm (Fig. 13) or w-C (Fig. 14) is to be created.
  • the virtual vocabulary database management software module 1226 compiles and identifies a set of hierarchical virtual sub- vocabularies according to a prioritization algorithm for a particular user installation. These virtual sub-vocabularies are preferably created from the raw text data associated with the records database and are formatted to include a virtual database of each text/vocabulary element ever used in the records database.
  • each vocabulary element is tagged, as shown in Fig 11 , with a frequency of use for each criteria of relevance in the database the according to a particular prioritization criteria for the particular installation.
  • the installation may be a three physician medical practice in which electronic medical record dictation is being performed.
  • the relevant criteria for the instant database match may include identifying information for: 1) the dictating physician, 2) the patient to which the dictation refers, 3) a medical context for the patient to which the dictation pertains (e.g. allergy diagnosis); and 4) the electronic form field into which the speech-to-text output is to be provided.
  • identifying information for: 1) the dictating physician, 2) the patient to which the dictation refers, 3) a medical context for the patient to which the dictation pertains (e.g. allergy diagnosis); and 4) the electronic form field into which the speech-to-text output is to be provided.
  • the highest probability of a match against the input speech would be made within the virtual database management software if first virtual sub-vocabulary for matching a dictated medical record entry would include all the criteria above, which would be all terms that that the physician has dictated into that records database for that form filed for a particular patient context and having the particular medical problem being addressed (e.g. allergy).
  • Additional, more general, lower priority and necessarily larger virtual sub-vocabularies may be created at step 1422 by omitting consideration of one ore more of the criteria in the formation of the virtual sub-directory.
  • cross-prioritized virtual sub-vocabularies may be created by combining different permutations of the full set of criteria and generating appropriate virtual sub-vocabularies.
  • a next highest (2 nd ) priority sub-vocabulary may include all vocabulary elements for that user, in the specified context (e.g. allergy) and dictated into that records form field.
  • a third priority virtual sub- vocabulary might be created to include consideration of every vocabulary element dictated by that user for that form field.
  • a fourth priority virtual sub-vocabulary might be created to include vocabulary elements used by all users of the system as dictated into that form filed for that context for that patient.
  • a fifth priority sub-vocabulary might be created to include consideration of only all the vocabulary elements dictated by the user.
  • the virtual database management software uses the calculated virtual sub-vocabularies to derive modified weighting criteria for the input vocabulary elements.
  • the input vocabulary element are matched against those within the first virtual sub-vocabulary and appropriate weighting adjustments are determined by the prioritization module. For example, any vocabulary elements input to the virtual database management system according to the general process described above with respect to Fig.
  • the modified weighing is applied to the initial weighing for each matching vocabulary element in the array (1427), or in the case of no match, the initial weighting is unaltered (1429).
  • the prioritization module then applies the modified weightings to create an interim sub-vocabulary with partially modified weights (1431) and passes the interim sub-vocabulary to the next processing stage for comparison to the next (2 nd ) virtual sub-vocabulary.
  • step 1434 the same sequence of steps takes place as provided in steps 1425 to 1431 above: vocabulary element comparison to the next virtual sub-vocabulary; determination of appropriate weighting modifications based on vocabulary element matches in that vocabulary, and application of those modifications to the weightings in the vocabulary element array.
  • the second virtual sub-vocabulary weightings may be as shown in table 3 below:
  • the prioritization module 1250 makes a final replacement of the default/initial weightings in the dictation results array (1442) and the virtual database management software returns the dictation results array to the speech engine for final selection of a matching vocabulary element (1444) based on the modified weightings provided by the system.
  • the processing steps of block 1450 are executed for each dictated input: virtual sub-vocabularies are created 1422, the initial dictation results array as provided by the speech engine is evaluated against each of the sequential virtual sub vocabularies, modified weightings for the vocabulary elements are applied and input t the array, and the array is returned to the speech engine for selection of a single appropriate matching vocabulary element based on the modified weightings.
  • the compilation of the virtual sub-vocabularies may be altered over time and in response to a matching accuracy metric so as to achieve improved modified weightings and subsequent matches for the particular application/installation using the system.
  • utility programs may be included as part of the scanning module which are written to update the use tags each time n utterance is dictated into a specific field of a specific form, and another utility can periodically for (for sufficiently busy systems) constantly scan the database in background and update the various elements of these tags.
  • this background scanning utility may accommodate weighting schemes in which there is a time-dependent decay in weightings.
  • the utilities described in this paragraph could update tags for weighting schemes in which prioritization weights for prior use decline over time.
  • each prior use within the preceding 18 months has a weight of one unit
  • each prior use between 18 and 30 months previously will have a weight of 0.5 unit
  • each prior use between 30 and 42 months earlier will have a weight of 0.25 unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

Cette invention concerne une base de vocabulaire virtuel à utiliser avec une base de données utilisateur particulière dans le cadre d'un système de reconnaissance vocal. Les éléments de vocabulaire présents dans la base de données virtuelles sont marqués de manière à inclure des données numériques correspondant à l'utilisation passée des éléments de vocabulaire dans la base de données utilisateur. Pour chaque entrée vocale, les correspondances entre éléments de vocabulaire potentiels émanant du système de reconnaissance vocale sont fournies au logiciel de la base de données virtuelle, laquelle crée des sous-vocabulaires virtuels à partir de critères reflétant des modèles de critère prédéfinis. Le logiciel soumet ensuite les éléments de vocabulaire à des réglages de pondération selon des pondérations des sous-vocabulaires et applique ce réglage à la pondération par défaut effectuée par le système de reconnaissance vocale. Les pondérations modifiées sont retournées conjointement avec les éléments de vocabulaire associés à la machine de traitement de la parole en vue de la sélection d'une correspondance appropriée avec la parole d'entrée.
PCT/US2006/041357 2005-10-21 2006-10-23 Procede et dispositif permettant d'ameliorer la precision de transcription dans un logiciel de reconnaissance vocale WO2007048053A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP06826505A EP1946292A1 (fr) 2005-10-21 2006-10-23 Procede et dispositif permettant d'ameliorer la precision de transcription dans un logiciel de reconnaissance vocale

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US72899005P 2005-10-21 2005-10-21
US60/728,990 2005-10-21
US11/510,435 2006-08-25
US11/510,435 US7809565B2 (en) 2003-03-01 2006-08-25 Method and apparatus for improving the transcription accuracy of speech recognition software

Publications (1)

Publication Number Publication Date
WO2007048053A1 true WO2007048053A1 (fr) 2007-04-26

Family

ID=37726731

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2006/041357 WO2007048053A1 (fr) 2005-10-21 2006-10-23 Procede et dispositif permettant d'ameliorer la precision de transcription dans un logiciel de reconnaissance vocale

Country Status (2)

Country Link
EP (1) EP1946292A1 (fr)
WO (1) WO2007048053A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013192535A1 (fr) * 2012-06-22 2013-12-27 Johnson Controls Technology Company Systèmes et méthodes de reconnaissance vocale multipasse pour véhicule
CN110019776A (zh) * 2017-09-05 2019-07-16 腾讯科技(北京)有限公司 文章分类方法及装置、存储介质
CN112216284A (zh) * 2020-10-09 2021-01-12 携程计算机技术(上海)有限公司 训练数据更新方法及系统、语音识别方法及系统、设备
CN112509566A (zh) * 2020-12-22 2021-03-16 北京百度网讯科技有限公司 一种语音识别方法、装置、设备、存储介质及程序产品

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1998034217A1 (fr) * 1997-01-30 1998-08-06 Dragon Systems, Inc. Reconnaissance de la parole utilisant des unites de reconnaissance multiples
GB2345783A (en) * 1999-01-12 2000-07-19 Speech Recognition Company Speech recognition system
US20030088410A1 (en) * 2001-11-06 2003-05-08 Geidl Erik M Natural input recognition system and method using a contextual mapping engine and adaptive user bias
US20040078756A1 (en) * 2002-10-15 2004-04-22 Napper Jonathon Leigh Method of improving recognition accuracy in form-based data entry systems
WO2004079720A1 (fr) * 2003-03-01 2004-09-16 Robert E Coifman Procede et appareil d'amelioration de la precision de transcription de la reconnaissance vocale

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1998034217A1 (fr) * 1997-01-30 1998-08-06 Dragon Systems, Inc. Reconnaissance de la parole utilisant des unites de reconnaissance multiples
GB2345783A (en) * 1999-01-12 2000-07-19 Speech Recognition Company Speech recognition system
US20030088410A1 (en) * 2001-11-06 2003-05-08 Geidl Erik M Natural input recognition system and method using a contextual mapping engine and adaptive user bias
US20040078756A1 (en) * 2002-10-15 2004-04-22 Napper Jonathon Leigh Method of improving recognition accuracy in form-based data entry systems
WO2004079720A1 (fr) * 2003-03-01 2004-09-16 Robert E Coifman Procede et appareil d'amelioration de la precision de transcription de la reconnaissance vocale

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013192535A1 (fr) * 2012-06-22 2013-12-27 Johnson Controls Technology Company Systèmes et méthodes de reconnaissance vocale multipasse pour véhicule
US9779723B2 (en) 2012-06-22 2017-10-03 Visteon Global Technologies, Inc. Multi-pass vehicle voice recognition systems and methods
CN110019776A (zh) * 2017-09-05 2019-07-16 腾讯科技(北京)有限公司 文章分类方法及装置、存储介质
CN110019776B (zh) * 2017-09-05 2023-04-28 腾讯科技(北京)有限公司 文章分类方法及装置、存储介质
CN112216284A (zh) * 2020-10-09 2021-01-12 携程计算机技术(上海)有限公司 训练数据更新方法及系统、语音识别方法及系统、设备
CN112216284B (zh) * 2020-10-09 2024-02-06 携程计算机技术(上海)有限公司 训练数据更新方法及系统、语音识别方法及系统、设备
CN112509566A (zh) * 2020-12-22 2021-03-16 北京百度网讯科技有限公司 一种语音识别方法、装置、设备、存储介质及程序产品
CN112509566B (zh) * 2020-12-22 2024-03-19 阿波罗智联(北京)科技有限公司 一种语音识别方法、装置、设备、存储介质及程序产品

Also Published As

Publication number Publication date
EP1946292A1 (fr) 2008-07-23

Similar Documents

Publication Publication Date Title
US7805299B2 (en) Method and apparatus for improving the transcription accuracy of speech recognition software
US10733976B2 (en) Method and apparatus for improving the transcription accuracy of speech recognition software
US7809565B2 (en) Method and apparatus for improving the transcription accuracy of speech recognition software
EP1599867B1 (fr) Amelioration de la precision de transcription de la reconnaissance vocale
EP1787288B1 (fr) Extraction automatique de contenu semantique et production de document structure a partir de la parole
US7580835B2 (en) Question-answering method, system, and program for answering question input by speech
US20200125795A1 (en) Insertion of standard text in transcription
US9779211B2 (en) Computer-assisted abstraction for reporting of quality measures
EP1687807B1 (fr) Modeles specifiques de themes pour le formatage de textes et la reconnaissance vocale
US20130304453A9 (en) Automated Extraction of Semantic Content and Generation of a Structured Document from Speech
US8301448B2 (en) System and method for applying dynamic contextual grammars and language models to improve automatic speech recognition accuracy
US7853446B2 (en) Generation of codified electronic medical records by processing clinician commentary
US20040102971A1 (en) Method and system for context-sensitive recognition of human input
WO2010117424A2 (fr) Abstraction de données et codage de documents assistés par ordinateur
US6963834B2 (en) Method of speech recognition using empirically determined word candidates
WO2007048053A1 (fr) Procede et dispositif permettant d'ameliorer la precision de transcription dans un logiciel de reconnaissance vocale
CN113761899A (zh) 一种医疗文本生成方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2006826505

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2034/KOLNP/2008

Country of ref document: IN