WO2004072846A2 - Automatic processing of templates with speech recognition - Google Patents

Automatic processing of templates with speech recognition Download PDF

Info

Publication number
WO2004072846A2
WO2004072846A2 PCT/IB2004/050081 IB2004050081W WO2004072846A2 WO 2004072846 A2 WO2004072846 A2 WO 2004072846A2 IB 2004050081 W IB2004050081 W IB 2004050081W WO 2004072846 A2 WO2004072846 A2 WO 2004072846A2
Authority
WO
WIPO (PCT)
Prior art keywords
speech recognition
text
template
document
predefined
Prior art date
Application number
PCT/IB2004/050081
Other languages
French (fr)
Other versions
WO2004072846A8 (en
WO2004072846A3 (en
Inventor
Dieter Hoi
Original Assignee
Koninklijke Philips Electronics N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics N.V. filed Critical Koninklijke Philips Electronics N.V.
Publication of WO2004072846A2 publication Critical patent/WO2004072846A2/en
Publication of WO2004072846A3 publication Critical patent/WO2004072846A3/en
Publication of WO2004072846A8 publication Critical patent/WO2004072846A8/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Definitions

  • the invention relates to a speech recognition device for processing a predefined form, wherein the speech recognition device has speech recognition means which can be fed a spoken text and transcribe it into a recognized text.
  • the invention further relates to a speech recognition method for processing a predefined form, comprising the reception of spoken text and the transcribing of the spoken text into a recognized text.
  • the document WO 98/43181 discloses a system for completing documents or for filling out forms in a text processing program, or for entering data in a database.
  • the term "document” should in this case be understood in the broad sense and comprises any file, created using a text processing program or a database, in which the user intends to perform at least one data entry.
  • the known system for completing documents comprises data input means for inputting data in written or spoken form and comparison means for comparing the input data with stored reference data, wherein the correspondence of input data with given reference data is used to identify the data category to which the input data belong.
  • WO 98/43181 proposes a method of producing a data input system for a specific user program, in which an existing document format or data input form used in the user program is analyzed in order to identify and characterize input fields. The result of the analysis is used to generate additional program code which is added to the operating system on which the user program runs, or to the user program itself. The effect of the additional program code is that each data input is compared with the stored reference data in order to identify the data category of the data input, and the input data are then entered in that input field of the document format or data input form whose data category corresponds to that of the input data.
  • To generate the additional program code use may advantageously be made of ActiveX technology, which is compatible with a large number of more advanced programming languages such as Visual Basic, C++, etc. A description is given of an ActiveX control which automatically carries out the above method steps.
  • the known system for completing documents and for filling out forms is used in particular to make it easier for unpracticed computer users to input data in input fields of documents created using a text processing program or in input masks of databases.
  • unpracticed computer users of this type mention may be made of doctors and attorneys who are inexperienced at using a computer and who encounter problems even when navigating between input fields within an input mask using a keyboard or mouse.
  • the system known from the document WO 98/43181, for completing documents or for filling out forms does provide unpracticed users with support essentially to correctly fill out documents and forms, this known system and method limit the more practiced user when processing documents and forms more than they help him, since he is forced to move within predefined input fields or during the analysis of recognized input fields while he has no access to the other parts of the document or form.
  • the known system and method can only be used on forms and input masks with a rigid form, for example for filling out a patient information sheet which comprises only a few input fields such as, for example, patient name, address, date of birth, name of the physician administering treatment, diagnosis, etc.
  • the system does not offer the user the option of navigating outside the predefined input fields in the document and processing it for example by adding and deleting text or changing the formatting.
  • essential functions of existing speech recognition programs cannot be used with the known system. This includes what is referred to as playback with synchronous "highlighting", i.e. where the user's dictation can be played back repeatedly after it has been recorded and recognized by the speech recognition software, wherein the user can hear the spoken text and, in synchronism therewith, follow the corresponding text parts recognized by the speech recognition software by means of colored highlighting on the screen, and make corrections where necessary.
  • such a speech recognition method comprises the following further method steps: analyzing the predefined form and converting it into a form template having a data format which corresponds to that of the recognized text, and combining the form template and the recognized text to form a document.
  • a user wishing to process a form receives said form in the form of a form template which can be processed by him, on which the functions of advanced speech recognition software, such as navigation using voice commands, synchronous playback with highlighting, substitution, etc., can also be used.
  • process should in this case be understood in the broad sense and comprises, on the one hand, the filling out or amending of defined text fields but, on the other hand, also the amending, supplementing and/or formatting of predefined elements of the form.
  • the proposed solution therefore offers the user all the possibilities for creating individual documents by means of speech recognition, which up to now could only be used if the entire document was created from a dictation since only then was the necessary information available to the speech recognition software.
  • the proposed method goes way beyond the conventional methods of filling out documents and inserting data into input masks, as disclosed in the document WO 98/43181, in that it is now possible to configure each document individually.
  • the advantage is obtained that a large number of forms can be accessed in a simple manner, wherein the computer files may have different formats so that the highest possible degree of flexibility in using the invention is ensured.
  • the computer files may be in different formats which can be automatically recognized and therefore treated correctly
  • the forms stored in the computer files may also be in different formats which are automatically recognized and processed correctly.
  • the word "automatically” is to be understood to mean that it has nothing to do with the user of the speech recognition device.
  • the advantage is obtained that, as templates, use can also be made of those documents and working templates which have been created using software generally used in office applications, such as documents created using text processing programs, tables and diagrams created using table calculation programs, or reports drawn up using database programs. These files are automatically recognized and converted into a format which can be processed by the speech recognition device.
  • the advantage is obtained that use can also be made of those forms which the user has available only in paper form, without said user having to input these forms manually or by dictation.
  • Fig. 1 shows a speech recognition device for processing a predefined form.
  • Fig. 2 shows a form which is to be completed by a user.
  • Fig. 3 shows the form of Fig. 2 after it has been completed by a user.
  • Fig. 4 shows the form of Fig. 3, wherein a user has performed additional formatting.
  • Fig. 1 shows a speech recognition device 1 for processing a predefined form 2.
  • the speech recognition device 1 may be formed by a computer which implements speech recognition software.
  • the speech recognition device 1 comprises speech recognition means 7, storage means 8, parameter storage means 9, command storage means 10 and an adaptation stage 11.
  • An audio signal A representing spoken text GT can be output via a microphone 5 to an A/D converter 6 which converts the audio signal A into digital audio data AD that can be fed to the speech recognition means 7.
  • the speech recognition means 7 convert the digital audio data AD into recognized text ET that is stored in the storage means 8.
  • parameter information PI that is stored in the parameter storage means 9 is taken into account, said parameter information PI comprising vocabulary information, speech model information and acoustic infonnation.
  • the vocabulary information comprises all words that can be recognized by the speech recognition means 7, together with phoneme sequences.
  • the speech model information comprises statistical information regarding the sequences of words that are customary in the speech of the spoken text GT.
  • the acoustic information comprises information about the characteristics of the accent of a user of the speech recognition device 1 and about acoustic properties of the microphone 5 and of the A/D converter 6. Speech model information and acoustic information can be configured in a user-specific manner.
  • the speech recognition means 7 can store text data comprising the recognized text ET in the storage means 8. Furthermore, the spoken text GT can be stored in the storage means 8 in digitized form.
  • Sequences of words which are recognized as a command by the speech recognition means 7 are stored in the command storage stage 10.
  • Such commands comprise, for example, the sequence of words "next word bold” in order to make the next word in the recognized text ET bold.
  • commands can be matched in a user- specific manner, so that all users do not need to use the same strict sequence of words.
  • commands can be stored in a document-specific manner, so that a sequence of command words has a fixed reference to a specific document.
  • a predefined form 2 which is to be processed by the user of the speech recognition device 1 can be either in paper form or in the form of a computer file 3.
  • the term "process” comprises in this connection the filling out, formatting, supplementing or deletion of elements of a form.
  • this computer file 3 can be on any desired storage medium, such as the hard disk of the computer on which the speech recognition software is implemented, on a floppy disk or on a CD-ROM.
  • the computer file 3 can also be made available via a computer network, such as the Internet for example.
  • the computer file 3 is read into analysis means 12 which are provided with computer file recognition means 13 that recognize the format of the computer file 3.
  • the computer file 3 can be in a proprietary or standardized format for speech recognition software which can be processed directly by the speech recognition means without reformatting.
  • the computer file 3 can also be in one of many formats produced by software generally used in office applications, which may be text documents possibly mixed with image elements, tables, etc.
  • Such formats are recognized by the computer file recognition means 13 and are converted by means of computer file conversion means 14 into a data format which can be processed directly by the speech recognition means 7.
  • the data format produced by the computer file conversion means 14 will be the same as the data format in which the recognized text ET is stored in the storage means 8. If the speech recognition means are configured to process a number of data formats in which the recognized text ET may be, then the recognized text ET and the data format produced by the computer file conversion means 14 may also be different from one another.
  • the data format produced by the analysis means 12 from the template 2 is stored as a form template FV in storage means 16.
  • the form 2 to be processed is in paper form, it can be converted using a scanner 4, communicating with the analysis means 12, into a computer image data format BF which is subsequently converted by character recognition means 15, contained in the analysis means 12, into the form template FV which is stored in the storage means 16.
  • the speech recognition means 7 combine the form template FV with the recognized text ET to form a document DK which is stored in storage means 17.
  • This document DK can be processed by the user of the speech recognition device 1 like any other document which has been created directly using the speech recognition device 1.
  • all the functions of advanced speech recognition software can be used on the document DK.
  • the document DK can be read into reproduction and correction means 18, to which a keyboard 19, a monitor 20 and a loudspeaker 21 are connected.
  • the reproduction and correction means 18 are designed for the visual displaying of the form 2 on the monitor 20 and also for the acoustic reproduction of the spoken text GT and for the synchronous visual marking of the associated recognized text ET in the document DK and of the analyzed elements of the form 2 if the reproduction and correction means 18 are in an activated synchronous reproduction mode of operation.
  • the document DK can be corrected by inputting via the keyboard and also by means of voice commands via the speech recognition device 1 simultaneously.
  • the form 2 illustrated in Fig. 2 is the template for a radiology report which is to be completed by a radiologist using information about the patient name, clinical information and a summary. It should be mentioned that, for the purposes of the present invention, it is not necessary for the form to comprise separate text fields since the user can navigate at will in the form template FV produced by the speech recognition means 7 from the form, and can therefore perform corresponding inputs at any point in the form template. It is therefore completely sufficient, for example, if the form comprises only individual headings.
  • the conversion is carried out using the computer file recognition means 13 and, where appropriate, the computer file conversion means 14.
  • the form template FV produced by the analysis means 12 is presented to the user on the monitor 20, along with the original form 2 illustrated in Fig. 2, and has for example the following data format:
  • This form template FV can now be filled out and processed by the user by means of dictation.
  • the user dictates into the speech recognition device 1 for example the following spoken text GT:
  • This spoken text GT is converted by the speech recognition means 7 into recognized text ET which is stored in the storage means 8.
  • the recognized text ET and the form template FV are subsequently combined to form a single document DK which is stored in the storage means 17 where the reproduction and correction means 18 can access it.
  • the document DK has the following content:
  • the keyword TEMPLATE TEXT serves as an indication to the speech recognition means 7 and the reproduction and correction means 18 that the adjusted text originates from a predefined form and therefore no audio information is available for it. After the comma there may be any formatting information, which of course can also be input as a command by means of dictation.
  • the keyword TEMPLATE TEXTFIELD indicates that it is a text field which originates from a predefined form, so that no audio information is available.
  • the parameters line and column indicate the position of the text field in the form.
  • the parameter date [dd.mm.yyyy] provides more detailed information about the text field.
  • the keyword DICTATION indicates dictated text.
  • the value after the comma indicates the audio position of the respective word (beginning and end in milliseconds relative to the start of dictation).

Abstract

A speech recognition device (1) for processing a predefined form (2), having speech recognition means (7) which can be fed a spoken text (GT) and transcribe it into a recognized text (ET), has analysis means (12) for analyzing and converting the predefined form (2) into a form template (FV) having a data format which can be processed by the speech recognition means (7), wherein the form template (FV) and the recognized text (ET) can be combined by the speech recognition means (7) to form a document (DK).

Description

Automatic processing of templates with speech recognition
The invention relates to a speech recognition device for processing a predefined form, wherein the speech recognition device has speech recognition means which can be fed a spoken text and transcribe it into a recognized text.
The invention further relates to a speech recognition method for processing a predefined form, comprising the reception of spoken text and the transcribing of the spoken text into a recognized text.
The document WO 98/43181 discloses a system for completing documents or for filling out forms in a text processing program, or for entering data in a database. The term "document" should in this case be understood in the broad sense and comprises any file, created using a text processing program or a database, in which the user intends to perform at least one data entry. The known system for completing documents comprises data input means for inputting data in written or spoken form and comparison means for comparing the input data with stored reference data, wherein the correspondence of input data with given reference data is used to identify the data category to which the input data belong. By means of this connection between input data and the data category thereof, it is possible for the system to complete a document or form created using a text processing program by inputting the input data in an input field associated with the assigned data category, or to fill out a data record by filling out with the assigned input data the individual fields of the database which are associated with a respective data category.
In an arrangement of this known system for completing documents or for filling out forms or entering data in a database, WO 98/43181 proposes a method of producing a data input system for a specific user program, in which an existing document format or data input form used in the user program is analyzed in order to identify and characterize input fields. The result of the analysis is used to generate additional program code which is added to the operating system on which the user program runs, or to the user program itself. The effect of the additional program code is that each data input is compared with the stored reference data in order to identify the data category of the data input, and the input data are then entered in that input field of the document format or data input form whose data category corresponds to that of the input data. To generate the additional program code, use may advantageously be made of ActiveX technology, which is compatible with a large number of more advanced programming languages such as Visual Basic, C++, etc. A description is given of an ActiveX control which automatically carries out the above method steps.
The known system for completing documents and for filling out forms is used in particular to make it easier for unpracticed computer users to input data in input fields of documents created using a text processing program or in input masks of databases. By way of example of unpracticed computer users of this type, mention may be made of doctors and attorneys who are inexperienced at using a computer and who encounter problems even when navigating between input fields within an input mask using a keyboard or mouse. Although it should be noted that the system, known from the document WO 98/43181, for completing documents or for filling out forms does provide unpracticed users with support essentially to correctly fill out documents and forms, this known system and method limit the more practiced user when processing documents and forms more than they help him, since he is forced to move within predefined input fields or during the analysis of recognized input fields while he has no access to the other parts of the document or form. As a result, the known system and method can only be used on forms and input masks with a rigid form, for example for filling out a patient information sheet which comprises only a few input fields such as, for example, patient name, address, date of birth, name of the physician administering treatment, diagnosis, etc. However, the system does not offer the user the option of navigating outside the predefined input fields in the document and processing it for example by adding and deleting text or changing the formatting. Furthermore, essential functions of existing speech recognition programs cannot be used with the known system. This includes what is referred to as playback with synchronous "highlighting", i.e. where the user's dictation can be played back repeatedly after it has been recorded and recognized by the speech recognition software, wherein the user can hear the spoken text and, in synchronism therewith, follow the corresponding text parts recognized by the speech recognition software by means of colored highlighting on the screen, and make corrections where necessary. This function assumes that the speech recognition software knows the entire content of the document or form in a format that it can process, which is not the case with the system and method for completing documents known from WO 98/43181. It is an object of the invention to provide a speech recognition device of the type specified in the first paragraph and a speech recognition method of the type specified in the second paragraph, in which the abovementioned disadvantages are avoided. To achieve the abovementioned object, in such a speech recognition device use is made of analysis means for analyzing and converting the predefined form into a form template having a data format which can be processed by the speech recognition means, wherein the form template and the recognized text can be combined by the speech recognition means to form a document. To achieve the abovementioned object, such a speech recognition method comprises the following further method steps: analyzing the predefined form and converting it into a form template having a data format which corresponds to that of the recognized text, and combining the form template and the recognized text to form a document. By means of the features according to the invention, a user wishing to process a form receives said form in the form of a form template which can be processed by him, on which the functions of advanced speech recognition software, such as navigation using voice commands, synchronous playback with highlighting, substitution, etc., can also be used. The term "process" should in this case be understood in the broad sense and comprises, on the one hand, the filling out or amending of defined text fields but, on the other hand, also the amending, supplementing and/or formatting of predefined elements of the form. The proposed solution therefore offers the user all the possibilities for creating individual documents by means of speech recognition, which up to now could only be used if the entire document was created from a dictation since only then was the necessary information available to the speech recognition software. The proposed method goes way beyond the conventional methods of filling out documents and inserting data into input masks, as disclosed in the document WO 98/43181, in that it is now possible to configure each document individually.
In accordance with the measures of Claims 2 and 7, the advantage is obtained that a large number of forms can be accessed in a simple manner, wherein the computer files may have different formats so that the highest possible degree of flexibility in using the invention is ensured. Just as the computer files may be in different formats which can be automatically recognized and therefore treated correctly, the forms stored in the computer files may also be in different formats which are automatically recognized and processed correctly. In this connection, the word "automatically" is to be understood to mean that it has nothing to do with the user of the speech recognition device.
In accordance with the measures of Claims 3 and 8, the advantage is obtained that, as templates, use can also be made of those documents and working templates which have been created using software generally used in office applications, such as documents created using text processing programs, tables and diagrams created using table calculation programs, or reports drawn up using database programs. These files are automatically recognized and converted into a format which can be processed by the speech recognition device. In accordance with the measures of Claims 4 and 9, the advantage is obtained that use can also be made of those forms which the user has available only in paper form, without said user having to input these forms manually or by dictation.
In accordance with the measures of Claims 5 and 10, the advantage is obtained that forms which comprise text fields as the essential component to be processed by the user can be completed quickly and by means of simple navigation.
The invention will be further described with reference to examples of embodiments shown in the drawings to which, however, the invention is not restricted. Fig. 1 shows a speech recognition device for processing a predefined form.
Fig. 2 shows a form which is to be completed by a user. Fig. 3 shows the form of Fig. 2 after it has been completed by a user. Fig. 4 shows the form of Fig. 3, wherein a user has performed additional formatting.
Fig. 1 shows a speech recognition device 1 for processing a predefined form 2. The speech recognition device 1 may be formed by a computer which implements speech recognition software. The speech recognition device 1 comprises speech recognition means 7, storage means 8, parameter storage means 9, command storage means 10 and an adaptation stage 11. An audio signal A representing spoken text GT can be output via a microphone 5 to an A/D converter 6 which converts the audio signal A into digital audio data AD that can be fed to the speech recognition means 7. The speech recognition means 7 convert the digital audio data AD into recognized text ET that is stored in the storage means 8. For this purpose, parameter information PI that is stored in the parameter storage means 9 is taken into account, said parameter information PI comprising vocabulary information, speech model information and acoustic infonnation.
The vocabulary information comprises all words that can be recognized by the speech recognition means 7, together with phoneme sequences. The speech model information comprises statistical information regarding the sequences of words that are customary in the speech of the spoken text GT. The acoustic information comprises information about the characteristics of the accent of a user of the speech recognition device 1 and about acoustic properties of the microphone 5 and of the A/D converter 6. Speech model information and acoustic information can be configured in a user-specific manner.
The document US 5,031,113, the disclosure of which is incorporated by way of reference into the disclosure of the present document, discloses the implementation of a speech recognition method taking into account such parameter information PI, and for this reason no more details of this will be given in the present text. As a result of the speech recognition method, the speech recognition means 7 can store text data comprising the recognized text ET in the storage means 8. Furthermore, the spoken text GT can be stored in the storage means 8 in digitized form.
Sequences of words which are recognized as a command by the speech recognition means 7 are stored in the command storage stage 10. Such commands comprise, for example, the sequence of words "next word bold" in order to make the next word in the recognized text ET bold. It should be mentioned that commands can be matched in a user- specific manner, so that all users do not need to use the same strict sequence of words. Furthermore, commands can be stored in a document-specific manner, so that a sequence of command words has a fixed reference to a specific document. A predefined form 2 which is to be processed by the user of the speech recognition device 1 can be either in paper form or in the form of a computer file 3. The term "process" comprises in this connection the filling out, formatting, supplementing or deletion of elements of a form. If the form 2 is in the form of a computer file 3, then this computer file 3 can be on any desired storage medium, such as the hard disk of the computer on which the speech recognition software is implemented, on a floppy disk or on a CD-ROM. However, the computer file 3 can also be made available via a computer network, such as the Internet for example. The computer file 3 is read into analysis means 12 which are provided with computer file recognition means 13 that recognize the format of the computer file 3. On the one hand, the computer file 3 can be in a proprietary or standardized format for speech recognition software which can be processed directly by the speech recognition means without reformatting. However, the computer file 3 can also be in one of many formats produced by software generally used in office applications, which may be text documents possibly mixed with image elements, tables, etc. Such formats are recognized by the computer file recognition means 13 and are converted by means of computer file conversion means 14 into a data format which can be processed directly by the speech recognition means 7. In general, the data format produced by the computer file conversion means 14 will be the same as the data format in which the recognized text ET is stored in the storage means 8. If the speech recognition means are configured to process a number of data formats in which the recognized text ET may be, then the recognized text ET and the data format produced by the computer file conversion means 14 may also be different from one another. The data format produced by the analysis means 12 from the template 2 is stored as a form template FV in storage means 16.
If the form 2 to be processed is in paper form, it can be converted using a scanner 4, communicating with the analysis means 12, into a computer image data format BF which is subsequently converted by character recognition means 15, contained in the analysis means 12, into the form template FV which is stored in the storage means 16.
The speech recognition means 7 combine the form template FV with the recognized text ET to form a document DK which is stored in storage means 17. This document DK can be processed by the user of the speech recognition device 1 like any other document which has been created directly using the speech recognition device 1. In particular, all the functions of advanced speech recognition software can be used on the document DK. For example, the document DK can be read into reproduction and correction means 18, to which a keyboard 19, a monitor 20 and a loudspeaker 21 are connected. The reproduction and correction means 18 are designed for the visual displaying of the form 2 on the monitor 20 and also for the acoustic reproduction of the spoken text GT and for the synchronous visual marking of the associated recognized text ET in the document DK and of the analyzed elements of the form 2 if the reproduction and correction means 18 are in an activated synchronous reproduction mode of operation. In this reproduction mode of operation, the document DK can be corrected by inputting via the keyboard and also by means of voice commands via the speech recognition device 1 simultaneously.
Hereinbelow, a simple example is given of how a user of the speech recognition device 1 can process a form 2 illustrated in Fig. 2, according to the invention. The form 2 illustrated in Fig. 2 is the template for a radiology report which is to be completed by a radiologist using information about the patient name, clinical information and a summary. It should be mentioned that, for the purposes of the present invention, it is not necessary for the form to comprise separate text fields since the user can navigate at will in the form template FV produced by the speech recognition means 7 from the form, and can therefore perform corresponding inputs at any point in the form template. It is therefore completely sufficient, for example, if the form comprises only individual headings. However, it is of course also possible to process forms which comprise text fields, such as the text field 22 in the upper right-hand corner of the form 2, said text field being surrounded by an outline and comprising a date. By means of the measures according to the invention, it is possible for the user both to process the text field 22 (for example by inputting the date when the report was compiled) and to process all other elements of the form, that is to say the headings, or to add, delete and format any desired elements in the document created from the form. When the form 2 is to be filled out for the first time, the user can use the analysis means 12 to convert it into a form template FV. If the form 2 is in paper form, the conversion is carried out using the scanner 4 and the character recognition means 15. If the form 2 is in the form of a computer file 3, the conversion is carried out using the computer file recognition means 13 and, where appropriate, the computer file conversion means 14. The form template FV produced by the analysis means 12 is presented to the user on the monitor 20, along with the original form 2 illustrated in Fig. 2, and has for example the following data format:
<ROOT >
<TEMPLATE TEXT, bold, "Radiology"> <TEMPLATE TEXT, bold, "Report">
<TEMPLATE TEXTFIELD, line, column, date [dd.mm.yyyy] >
<TEMPLATE TEXT, newline>
<TEMPLATE TEXT, underline, "Patient">
<TEMPLATE TEXT, underline, "name"> ' <TEMPLATE TEXT, underline, ":">
<TEMPLATE TEXT, newline>
^TEMPLATE TEXT, "Clinical">
<TEMPLATE TEXT, "information">
<TEMPLATE TEXT, ":"> <TEMPLATE TEXT, newline>
<TEMPLATE TEXT, "Summary">
^TEMPLATE TEXT, ":">
This form template FV can now be filled out and processed by the user by means of dictation. For this purpose, the user dictates into the speech recognition device 1 for example the following spoken text GT:
"patient name" (in command mode)
"Henry Schmidt"
"Summary" (in command mode) "bold on" (in command mode)
"Healing fracture mid left femoral diaphysis period"
"bold off' (in command mode)
"Clinical information" (in command mode)
"The fracture fragments are near anatomic alignment. A small amount of periosteal reaction has developed period"
This spoken text GT is converted by the speech recognition means 7 into recognized text ET which is stored in the storage means 8. The recognized text ET and the form template FV are subsequently combined to form a single document DK which is stored in the storage means 17 where the reproduction and correction means 18 can access it. According to the above dictation, the document DK has the following content:
<ROOT >
<TEMPLATE TEXT, bold, "Radiology">
<TEMPLATE TEXT, bold, "Report">
<TEMPLATE TEXTFIELD, line, column, date [dd.mm.yyyy] > <TEMPLATE TEXT, newline>
<TEMPLATE TEXT, underline, "Patient">
<TEMPLATE TEXT, underline, "name">
<TEMPLATE TEXT, underline, ":">
<DICTATION, 0-2500, "Henry"> <DICTATIOΝ, 2500-3800, "Schmidt">
<TEMPLATE TEXT, newline>
<TEMPLATE TEXT, "Clinical">
<TEMPLATE TEXT, "information'^
^TEMPLATE TEXT, ":"> <DICTATION, 12200-12700, "The">
<DICTATION, 22100-23300, "developed"> <DICTATION, 23300-23800, "."> <TEMPLATE TEXT, newline> <TEMPLATE TEXT, "Summary"> <TEMPLATE TEXT, ":"> <DICTATION, 3800-4500, bold, "Healing">
<DICTATION, 10500-11800, bold, "diaphysis">
<DICTATION, 11800-12200, bold, ".">
It should be mentioned that the keyword TEMPLATE TEXT serves as an indication to the speech recognition means 7 and the reproduction and correction means 18 that the adjusted text originates from a predefined form and therefore no audio information is available for it. After the comma there may be any formatting information, which of course can also be input as a command by means of dictation. The keyword TEMPLATE TEXTFIELD indicates that it is a text field which originates from a predefined form, so that no audio information is available. The parameters line and column indicate the position of the text field in the form. The parameter date [dd.mm.yyyy] provides more detailed information about the text field. It is accordingly a date field which represents a date with two digits for the day, two digits for the month and four digits for the year, that is to say 21.02.2003. The keyword DICTATION indicates dictated text. The value after the comma indicates the audio position of the respective word (beginning and end in milliseconds relative to the start of dictation).
This results in the document illustrated in Fig. 3.
Should the radiologist be unsatisfied with the content or the formatting of the document, he can make changes thereto at will. For example, he can dictate the following commands into the speech recognition device 1 :
"Patient name" (in command mode)
"italic" (in command mode)
"Clinical information" (in command mode)
"italic underline" (in command mode) "Summary" (in command mode)
"italic underline" (in command mode)
Once the commands have been processed by the speech recognition means 7, the result is the representation shown in Fig. 4 for the document DK. It should be understood that this is only a simple example of the possibilities that the invention provides, but that in fact all the processing possibilities of advanced speech recognition software are available to the user.

Claims

CLAIMS:
1. A speech recognition device (1) for processing a predefined form (2), wherein the speech recognition device (1) has speech recognition means (7) which can be fed a spoken text (GT) and transcribe it into a recognized text (ET), characterized by analysis means (12) for analyzing and converting the predefined form (2) into a form template (FV) having a data format which can be processed by the speech recognition means (7), wherein the form template (FV) and the recognized text (ET) can be combined by the speech recognition means (7) to form a document (DK).
2. A speech recognition device as claimed in Claim 1, characterized in that the predefined form (2) can be fed to the analysis means (12) as a computer file (3) and in that the analysis means (12) comprise a computer file recognition means (13).
3. A speech recognition device as claimed in Claim 2, characterized in that the computer file recognition means (13) comprise a computer file conversion means (14).
4. A speech recognition device as claimed in Claim 1, characterized in that the analysis means (12) comprise a scanner (4) and character recognition means (15).
5. A speech recognition device as claimed in Claim 1, characterized in that the combining of the form template (FV) and of the recognized text (ET) by the speech recognition means (7) comprises the filling out of at least one text field (22) in the form template with recognized text.
6. A speech recognition method for processing a predefined form (2), comprising the reception of spoken text (GT) and the transcribing of the spoken text into a recognized text (ET), characterized by the steps of: analyzing the predefined form (2) and converting it into a form template (FV) having a data format which corresponds to that of the recognized text (ET), and combining the form template (FV) and the recognized text (ET) to form a document (DK).
7. A speech recognition method as claimed in Claim 6, characterized in that the analyzing of the predefined form (2) comprises the reading in of the form (2) by a computer file (3) and recognitions of the file type of the computer file.
8. A speech recognition method as claimed in Claim 7, characterized in that the analyzing of the predefined form comprises the conversion of the read-in computer file (3) into a different data type.
9. A speech recognition method as claimed in Claim 6, characterized in that the analyzing of the predefined form comprises the scanning of the form and recognition of form text from the data (BF) obtained during the scanning.
10. A speech recognition method as claimed in Claim 6, characterized in that the combing of the form template (FV) and of the recognized text (ET) comprises the filling out of at least one text field (22) in the form template with recognized text.
PCT/IB2004/050081 2003-02-13 2004-02-05 Automatic processing of templates with speech recognition WO2004072846A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP03100332 2003-02-13
EP03100332.0 2003-02-13

Publications (3)

Publication Number Publication Date
WO2004072846A2 true WO2004072846A2 (en) 2004-08-26
WO2004072846A3 WO2004072846A3 (en) 2004-10-07
WO2004072846A8 WO2004072846A8 (en) 2004-12-09

Family

ID=32865046

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2004/050081 WO2004072846A2 (en) 2003-02-13 2004-02-05 Automatic processing of templates with speech recognition

Country Status (1)

Country Link
WO (1) WO2004072846A2 (en)

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017083205A1 (en) * 2015-11-11 2017-05-18 Microsoft Technology Licensing, Llc Provide interactive content generation for document
GB2545320A (en) * 2015-11-05 2017-06-14 Lenovo Singapore Pte Ltd Audio input of field entries
DK201670539A1 (en) * 2016-03-14 2017-10-02 Apple Inc Dictation that allows editing
US9865248B2 (en) 2008-04-05 2018-01-09 Apple Inc. Intelligent text-to-speech conversion
US9966060B2 (en) 2013-06-07 2018-05-08 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9971774B2 (en) 2012-09-19 2018-05-15 Apple Inc. Voice-based media searching
US9986419B2 (en) 2014-09-30 2018-05-29 Apple Inc. Social reminders
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
US10049675B2 (en) 2010-02-25 2018-08-14 Apple Inc. User profiling for voice input processing
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US10079014B2 (en) 2012-06-08 2018-09-18 Apple Inc. Name recognition system
US10318871B2 (en) 2005-09-08 2019-06-11 Apple Inc. Method and apparatus for building an intelligent automated assistant
US10356243B2 (en) 2015-06-05 2019-07-16 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
CN110210014A (en) * 2019-05-31 2019-09-06 贵州精准医疗电子有限公司 Intelligent form system
US10410637B2 (en) 2017-05-12 2019-09-10 Apple Inc. User-specific acoustic models
US10482874B2 (en) 2017-05-15 2019-11-19 Apple Inc. Hierarchical belief states for digital assistants
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
CN111243596A (en) * 2020-01-08 2020-06-05 中保车服科技服务股份有限公司 Insurance information acquisition method, device and equipment based on voice recognition and storage medium
US10706841B2 (en) 2010-01-18 2020-07-07 Apple Inc. Task flow identification based on user intent
US10755703B2 (en) 2017-05-11 2020-08-25 Apple Inc. Offline personal assistant
US10791176B2 (en) 2017-05-12 2020-09-29 Apple Inc. Synchronization and task delegation of a digital assistant
US10795541B2 (en) 2009-06-05 2020-10-06 Apple Inc. Intelligent organization of tasks items
US10810274B2 (en) 2017-05-15 2020-10-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
US10904611B2 (en) 2014-06-30 2021-01-26 Apple Inc. Intelligent automated assistant for TV user interactions
US11080012B2 (en) 2009-06-05 2021-08-03 Apple Inc. Interface for a virtual digital assistant
US11217255B2 (en) 2017-05-16 2022-01-04 Apple Inc. Far-field extension for digital assistant services

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1996013009A1 (en) * 1994-10-25 1996-05-02 Ho Janet Chung Kong System and method for generating database input forms
US20020143533A1 (en) * 2001-03-29 2002-10-03 Mark Lucas Method and apparatus for voice dictation and document production
WO2002082318A2 (en) * 2001-02-22 2002-10-17 Volantia Holdings Limited System and method for extracting information

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1996013009A1 (en) * 1994-10-25 1996-05-02 Ho Janet Chung Kong System and method for generating database input forms
WO2002082318A2 (en) * 2001-02-22 2002-10-17 Volantia Holdings Limited System and method for extracting information
US20020143533A1 (en) * 2001-03-29 2002-10-03 Mark Lucas Method and apparatus for voice dictation and document production

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10318871B2 (en) 2005-09-08 2019-06-11 Apple Inc. Method and apparatus for building an intelligent automated assistant
US9865248B2 (en) 2008-04-05 2018-01-09 Apple Inc. Intelligent text-to-speech conversion
US10795541B2 (en) 2009-06-05 2020-10-06 Apple Inc. Intelligent organization of tasks items
US11080012B2 (en) 2009-06-05 2021-08-03 Apple Inc. Interface for a virtual digital assistant
US10706841B2 (en) 2010-01-18 2020-07-07 Apple Inc. Task flow identification based on user intent
US11423886B2 (en) 2010-01-18 2022-08-23 Apple Inc. Task flow identification based on user intent
US10049675B2 (en) 2010-02-25 2018-08-14 Apple Inc. User profiling for voice input processing
US10079014B2 (en) 2012-06-08 2018-09-18 Apple Inc. Name recognition system
US9971774B2 (en) 2012-09-19 2018-05-15 Apple Inc. Voice-based media searching
US9966060B2 (en) 2013-06-07 2018-05-08 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US10904611B2 (en) 2014-06-30 2021-01-26 Apple Inc. Intelligent automated assistant for TV user interactions
US9986419B2 (en) 2014-09-30 2018-05-29 Apple Inc. Social reminders
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US10356243B2 (en) 2015-06-05 2019-07-16 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US9996517B2 (en) 2015-11-05 2018-06-12 Lenovo (Singapore) Pte. Ltd. Audio input of field entries
GB2545320A (en) * 2015-11-05 2017-06-14 Lenovo Singapore Pte Ltd Audio input of field entries
GB2545320B (en) * 2015-11-05 2020-08-05 Lenovo Singapore Pte Ltd Audio input of field entries
WO2017083205A1 (en) * 2015-11-11 2017-05-18 Microsoft Technology Licensing, Llc Provide interactive content generation for document
DK201670539A1 (en) * 2016-03-14 2017-10-02 Apple Inc Dictation that allows editing
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
US10553215B2 (en) 2016-09-23 2020-02-04 Apple Inc. Intelligent automated assistant
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US10755703B2 (en) 2017-05-11 2020-08-25 Apple Inc. Offline personal assistant
US10791176B2 (en) 2017-05-12 2020-09-29 Apple Inc. Synchronization and task delegation of a digital assistant
US10410637B2 (en) 2017-05-12 2019-09-10 Apple Inc. User-specific acoustic models
US11405466B2 (en) 2017-05-12 2022-08-02 Apple Inc. Synchronization and task delegation of a digital assistant
US10810274B2 (en) 2017-05-15 2020-10-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
US10482874B2 (en) 2017-05-15 2019-11-19 Apple Inc. Hierarchical belief states for digital assistants
US11217255B2 (en) 2017-05-16 2022-01-04 Apple Inc. Far-field extension for digital assistant services
CN110210014A (en) * 2019-05-31 2019-09-06 贵州精准医疗电子有限公司 Intelligent form system
CN110210014B (en) * 2019-05-31 2023-05-30 贵州精准医疗电子有限公司 Intelligent form system
CN111243596A (en) * 2020-01-08 2020-06-05 中保车服科技服务股份有限公司 Insurance information acquisition method, device and equipment based on voice recognition and storage medium

Also Published As

Publication number Publication date
WO2004072846A8 (en) 2004-12-09
WO2004072846A3 (en) 2004-10-07

Similar Documents

Publication Publication Date Title
WO2004072846A2 (en) Automatic processing of templates with speech recognition
US11586808B2 (en) Insertion of standard text in transcription
US7516070B2 (en) Method for simultaneously creating audio-aligned final and verbatim text with the assistance of a speech recognition program as may be useful in form completion using a verbal entry method
US8046226B2 (en) System and methods for reporting
US7979281B2 (en) Methods and systems for creating a second generation session file
JP2768727B2 (en) Report creation apparatus and method
DE60033106T2 (en) Correction of operating mode errors, control or dictation, in the speech recognition
US8504369B1 (en) Multi-cursor transcription editing
US20060190249A1 (en) Method for comparing a transcribed text file with a previously created file
US6915258B2 (en) Method and apparatus for displaying and manipulating account information using the human voice
US20090037171A1 (en) Real-time voice transcription system
US20020095290A1 (en) Speech recognition program mapping tool to align an audio file to verbatim text
EA004352B1 (en) Automated transcription system and method using two speech converting instances and computer-assisted correction
Lai et al. MedSpeak: Report creation with continuous speech recognition
JPH10507857A (en) System and method for generating a database input form
JP2014013399A (en) Method and system for processing dictated information
US7120581B2 (en) System and method for identifying an identical audio segment using text comparison
US20150293902A1 (en) Method for automated text processing and computer device for implementing said method
US20030097253A1 (en) Device to edit a text in predefined windows
US20070067168A1 (en) Method and device for transcribing an audio signal
JPH11272673A (en) Method and processor for document processing and record medium where computer program for document processing is recorded
JP2004287192A (en) Device and program for editing synthesized speech
WO2001093058A1 (en) System and method for comparing text generated in association with a speech recognition program
Schiavon et al. Radiological Reporting in the United States
JPH08314930A (en) Proofreading method for japanese sentence

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): BW GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WR Later publication of a revised version of an international search report
122 Ep: pct application non-entry in european phase