WO2004072846A2 - Automatic processing of templates with speech recognition - Google Patents
Automatic processing of templates with speech recognition Download PDFInfo
- Publication number
- WO2004072846A2 WO2004072846A2 PCT/IB2004/050081 IB2004050081W WO2004072846A2 WO 2004072846 A2 WO2004072846 A2 WO 2004072846A2 IB 2004050081 W IB2004050081 W IB 2004050081W WO 2004072846 A2 WO2004072846 A2 WO 2004072846A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- speech recognition
- text
- template
- document
- predefined
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/186—Templates
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Definitions
- the invention relates to a speech recognition device for processing a predefined form, wherein the speech recognition device has speech recognition means which can be fed a spoken text and transcribe it into a recognized text.
- the invention further relates to a speech recognition method for processing a predefined form, comprising the reception of spoken text and the transcribing of the spoken text into a recognized text.
- the document WO 98/43181 discloses a system for completing documents or for filling out forms in a text processing program, or for entering data in a database.
- the term "document” should in this case be understood in the broad sense and comprises any file, created using a text processing program or a database, in which the user intends to perform at least one data entry.
- the known system for completing documents comprises data input means for inputting data in written or spoken form and comparison means for comparing the input data with stored reference data, wherein the correspondence of input data with given reference data is used to identify the data category to which the input data belong.
- WO 98/43181 proposes a method of producing a data input system for a specific user program, in which an existing document format or data input form used in the user program is analyzed in order to identify and characterize input fields. The result of the analysis is used to generate additional program code which is added to the operating system on which the user program runs, or to the user program itself. The effect of the additional program code is that each data input is compared with the stored reference data in order to identify the data category of the data input, and the input data are then entered in that input field of the document format or data input form whose data category corresponds to that of the input data.
- To generate the additional program code use may advantageously be made of ActiveX technology, which is compatible with a large number of more advanced programming languages such as Visual Basic, C++, etc. A description is given of an ActiveX control which automatically carries out the above method steps.
- the known system for completing documents and for filling out forms is used in particular to make it easier for unpracticed computer users to input data in input fields of documents created using a text processing program or in input masks of databases.
- unpracticed computer users of this type mention may be made of doctors and attorneys who are inexperienced at using a computer and who encounter problems even when navigating between input fields within an input mask using a keyboard or mouse.
- the system known from the document WO 98/43181, for completing documents or for filling out forms does provide unpracticed users with support essentially to correctly fill out documents and forms, this known system and method limit the more practiced user when processing documents and forms more than they help him, since he is forced to move within predefined input fields or during the analysis of recognized input fields while he has no access to the other parts of the document or form.
- the known system and method can only be used on forms and input masks with a rigid form, for example for filling out a patient information sheet which comprises only a few input fields such as, for example, patient name, address, date of birth, name of the physician administering treatment, diagnosis, etc.
- the system does not offer the user the option of navigating outside the predefined input fields in the document and processing it for example by adding and deleting text or changing the formatting.
- essential functions of existing speech recognition programs cannot be used with the known system. This includes what is referred to as playback with synchronous "highlighting", i.e. where the user's dictation can be played back repeatedly after it has been recorded and recognized by the speech recognition software, wherein the user can hear the spoken text and, in synchronism therewith, follow the corresponding text parts recognized by the speech recognition software by means of colored highlighting on the screen, and make corrections where necessary.
- such a speech recognition method comprises the following further method steps: analyzing the predefined form and converting it into a form template having a data format which corresponds to that of the recognized text, and combining the form template and the recognized text to form a document.
- a user wishing to process a form receives said form in the form of a form template which can be processed by him, on which the functions of advanced speech recognition software, such as navigation using voice commands, synchronous playback with highlighting, substitution, etc., can also be used.
- process should in this case be understood in the broad sense and comprises, on the one hand, the filling out or amending of defined text fields but, on the other hand, also the amending, supplementing and/or formatting of predefined elements of the form.
- the proposed solution therefore offers the user all the possibilities for creating individual documents by means of speech recognition, which up to now could only be used if the entire document was created from a dictation since only then was the necessary information available to the speech recognition software.
- the proposed method goes way beyond the conventional methods of filling out documents and inserting data into input masks, as disclosed in the document WO 98/43181, in that it is now possible to configure each document individually.
- the advantage is obtained that a large number of forms can be accessed in a simple manner, wherein the computer files may have different formats so that the highest possible degree of flexibility in using the invention is ensured.
- the computer files may be in different formats which can be automatically recognized and therefore treated correctly
- the forms stored in the computer files may also be in different formats which are automatically recognized and processed correctly.
- the word "automatically” is to be understood to mean that it has nothing to do with the user of the speech recognition device.
- the advantage is obtained that, as templates, use can also be made of those documents and working templates which have been created using software generally used in office applications, such as documents created using text processing programs, tables and diagrams created using table calculation programs, or reports drawn up using database programs. These files are automatically recognized and converted into a format which can be processed by the speech recognition device.
- the advantage is obtained that use can also be made of those forms which the user has available only in paper form, without said user having to input these forms manually or by dictation.
- Fig. 1 shows a speech recognition device for processing a predefined form.
- Fig. 2 shows a form which is to be completed by a user.
- Fig. 3 shows the form of Fig. 2 after it has been completed by a user.
- Fig. 4 shows the form of Fig. 3, wherein a user has performed additional formatting.
- Fig. 1 shows a speech recognition device 1 for processing a predefined form 2.
- the speech recognition device 1 may be formed by a computer which implements speech recognition software.
- the speech recognition device 1 comprises speech recognition means 7, storage means 8, parameter storage means 9, command storage means 10 and an adaptation stage 11.
- An audio signal A representing spoken text GT can be output via a microphone 5 to an A/D converter 6 which converts the audio signal A into digital audio data AD that can be fed to the speech recognition means 7.
- the speech recognition means 7 convert the digital audio data AD into recognized text ET that is stored in the storage means 8.
- parameter information PI that is stored in the parameter storage means 9 is taken into account, said parameter information PI comprising vocabulary information, speech model information and acoustic infonnation.
- the vocabulary information comprises all words that can be recognized by the speech recognition means 7, together with phoneme sequences.
- the speech model information comprises statistical information regarding the sequences of words that are customary in the speech of the spoken text GT.
- the acoustic information comprises information about the characteristics of the accent of a user of the speech recognition device 1 and about acoustic properties of the microphone 5 and of the A/D converter 6. Speech model information and acoustic information can be configured in a user-specific manner.
- the speech recognition means 7 can store text data comprising the recognized text ET in the storage means 8. Furthermore, the spoken text GT can be stored in the storage means 8 in digitized form.
- Sequences of words which are recognized as a command by the speech recognition means 7 are stored in the command storage stage 10.
- Such commands comprise, for example, the sequence of words "next word bold” in order to make the next word in the recognized text ET bold.
- commands can be matched in a user- specific manner, so that all users do not need to use the same strict sequence of words.
- commands can be stored in a document-specific manner, so that a sequence of command words has a fixed reference to a specific document.
- a predefined form 2 which is to be processed by the user of the speech recognition device 1 can be either in paper form or in the form of a computer file 3.
- the term "process” comprises in this connection the filling out, formatting, supplementing or deletion of elements of a form.
- this computer file 3 can be on any desired storage medium, such as the hard disk of the computer on which the speech recognition software is implemented, on a floppy disk or on a CD-ROM.
- the computer file 3 can also be made available via a computer network, such as the Internet for example.
- the computer file 3 is read into analysis means 12 which are provided with computer file recognition means 13 that recognize the format of the computer file 3.
- the computer file 3 can be in a proprietary or standardized format for speech recognition software which can be processed directly by the speech recognition means without reformatting.
- the computer file 3 can also be in one of many formats produced by software generally used in office applications, which may be text documents possibly mixed with image elements, tables, etc.
- Such formats are recognized by the computer file recognition means 13 and are converted by means of computer file conversion means 14 into a data format which can be processed directly by the speech recognition means 7.
- the data format produced by the computer file conversion means 14 will be the same as the data format in which the recognized text ET is stored in the storage means 8. If the speech recognition means are configured to process a number of data formats in which the recognized text ET may be, then the recognized text ET and the data format produced by the computer file conversion means 14 may also be different from one another.
- the data format produced by the analysis means 12 from the template 2 is stored as a form template FV in storage means 16.
- the form 2 to be processed is in paper form, it can be converted using a scanner 4, communicating with the analysis means 12, into a computer image data format BF which is subsequently converted by character recognition means 15, contained in the analysis means 12, into the form template FV which is stored in the storage means 16.
- the speech recognition means 7 combine the form template FV with the recognized text ET to form a document DK which is stored in storage means 17.
- This document DK can be processed by the user of the speech recognition device 1 like any other document which has been created directly using the speech recognition device 1.
- all the functions of advanced speech recognition software can be used on the document DK.
- the document DK can be read into reproduction and correction means 18, to which a keyboard 19, a monitor 20 and a loudspeaker 21 are connected.
- the reproduction and correction means 18 are designed for the visual displaying of the form 2 on the monitor 20 and also for the acoustic reproduction of the spoken text GT and for the synchronous visual marking of the associated recognized text ET in the document DK and of the analyzed elements of the form 2 if the reproduction and correction means 18 are in an activated synchronous reproduction mode of operation.
- the document DK can be corrected by inputting via the keyboard and also by means of voice commands via the speech recognition device 1 simultaneously.
- the form 2 illustrated in Fig. 2 is the template for a radiology report which is to be completed by a radiologist using information about the patient name, clinical information and a summary. It should be mentioned that, for the purposes of the present invention, it is not necessary for the form to comprise separate text fields since the user can navigate at will in the form template FV produced by the speech recognition means 7 from the form, and can therefore perform corresponding inputs at any point in the form template. It is therefore completely sufficient, for example, if the form comprises only individual headings.
- the conversion is carried out using the computer file recognition means 13 and, where appropriate, the computer file conversion means 14.
- the form template FV produced by the analysis means 12 is presented to the user on the monitor 20, along with the original form 2 illustrated in Fig. 2, and has for example the following data format:
- This form template FV can now be filled out and processed by the user by means of dictation.
- the user dictates into the speech recognition device 1 for example the following spoken text GT:
- This spoken text GT is converted by the speech recognition means 7 into recognized text ET which is stored in the storage means 8.
- the recognized text ET and the form template FV are subsequently combined to form a single document DK which is stored in the storage means 17 where the reproduction and correction means 18 can access it.
- the document DK has the following content:
- the keyword TEMPLATE TEXT serves as an indication to the speech recognition means 7 and the reproduction and correction means 18 that the adjusted text originates from a predefined form and therefore no audio information is available for it. After the comma there may be any formatting information, which of course can also be input as a command by means of dictation.
- the keyword TEMPLATE TEXTFIELD indicates that it is a text field which originates from a predefined form, so that no audio information is available.
- the parameters line and column indicate the position of the text field in the form.
- the parameter date [dd.mm.yyyy] provides more detailed information about the text field.
- the keyword DICTATION indicates dictated text.
- the value after the comma indicates the audio position of the respective word (beginning and end in milliseconds relative to the start of dictation).
Abstract
A speech recognition device (1) for processing a predefined form (2), having speech recognition means (7) which can be fed a spoken text (GT) and transcribe it into a recognized text (ET), has analysis means (12) for analyzing and converting the predefined form (2) into a form template (FV) having a data format which can be processed by the speech recognition means (7), wherein the form template (FV) and the recognized text (ET) can be combined by the speech recognition means (7) to form a document (DK).
Description
Automatic processing of templates with speech recognition
The invention relates to a speech recognition device for processing a predefined form, wherein the speech recognition device has speech recognition means which can be fed a spoken text and transcribe it into a recognized text.
The invention further relates to a speech recognition method for processing a predefined form, comprising the reception of spoken text and the transcribing of the spoken text into a recognized text.
The document WO 98/43181 discloses a system for completing documents or for filling out forms in a text processing program, or for entering data in a database. The term "document" should in this case be understood in the broad sense and comprises any file, created using a text processing program or a database, in which the user intends to perform at least one data entry. The known system for completing documents comprises data input means for inputting data in written or spoken form and comparison means for comparing the input data with stored reference data, wherein the correspondence of input data with given reference data is used to identify the data category to which the input data belong. By means of this connection between input data and the data category thereof, it is possible for the system to complete a document or form created using a text processing program by inputting the input data in an input field associated with the assigned data category, or to fill out a data record by filling out with the assigned input data the individual fields of the database which are associated with a respective data category.
In an arrangement of this known system for completing documents or for filling out forms or entering data in a database, WO 98/43181 proposes a method of producing a data input system for a specific user program, in which an existing document format or data input form used in the user program is analyzed in order to identify and characterize input fields. The result of the analysis is used to generate additional program code which is added to the operating system on which the user program runs, or to the user program itself. The effect of the additional program code is that each data input is compared with the stored reference data in order to identify the data category of the data input, and the
input data are then entered in that input field of the document format or data input form whose data category corresponds to that of the input data. To generate the additional program code, use may advantageously be made of ActiveX technology, which is compatible with a large number of more advanced programming languages such as Visual Basic, C++, etc. A description is given of an ActiveX control which automatically carries out the above method steps.
The known system for completing documents and for filling out forms is used in particular to make it easier for unpracticed computer users to input data in input fields of documents created using a text processing program or in input masks of databases. By way of example of unpracticed computer users of this type, mention may be made of doctors and attorneys who are inexperienced at using a computer and who encounter problems even when navigating between input fields within an input mask using a keyboard or mouse. Although it should be noted that the system, known from the document WO 98/43181, for completing documents or for filling out forms does provide unpracticed users with support essentially to correctly fill out documents and forms, this known system and method limit the more practiced user when processing documents and forms more than they help him, since he is forced to move within predefined input fields or during the analysis of recognized input fields while he has no access to the other parts of the document or form. As a result, the known system and method can only be used on forms and input masks with a rigid form, for example for filling out a patient information sheet which comprises only a few input fields such as, for example, patient name, address, date of birth, name of the physician administering treatment, diagnosis, etc. However, the system does not offer the user the option of navigating outside the predefined input fields in the document and processing it for example by adding and deleting text or changing the formatting. Furthermore, essential functions of existing speech recognition programs cannot be used with the known system. This includes what is referred to as playback with synchronous "highlighting", i.e. where the user's dictation can be played back repeatedly after it has been recorded and recognized by the speech recognition software, wherein the user can hear the spoken text and, in synchronism therewith, follow the corresponding text parts recognized by the speech recognition software by means of colored highlighting on the screen, and make corrections where necessary. This function assumes that the speech recognition software knows the entire content of the document or form in a format that it can process, which is not the case with the system and method for completing documents known from WO 98/43181.
It is an object of the invention to provide a speech recognition device of the type specified in the first paragraph and a speech recognition method of the type specified in the second paragraph, in which the abovementioned disadvantages are avoided. To achieve the abovementioned object, in such a speech recognition device use is made of analysis means for analyzing and converting the predefined form into a form template having a data format which can be processed by the speech recognition means, wherein the form template and the recognized text can be combined by the speech recognition means to form a document. To achieve the abovementioned object, such a speech recognition method comprises the following further method steps: analyzing the predefined form and converting it into a form template having a data format which corresponds to that of the recognized text, and combining the form template and the recognized text to form a document. By means of the features according to the invention, a user wishing to process a form receives said form in the form of a form template which can be processed by him, on which the functions of advanced speech recognition software, such as navigation using voice commands, synchronous playback with highlighting, substitution, etc., can also be used. The term "process" should in this case be understood in the broad sense and comprises, on the one hand, the filling out or amending of defined text fields but, on the other hand, also the amending, supplementing and/or formatting of predefined elements of the form. The proposed solution therefore offers the user all the possibilities for creating individual documents by means of speech recognition, which up to now could only be used if the entire document was created from a dictation since only then was the necessary information available to the speech recognition software. The proposed method goes way beyond the conventional methods of filling out documents and inserting data into input masks, as disclosed in the document WO 98/43181, in that it is now possible to configure each document individually.
In accordance with the measures of Claims 2 and 7, the advantage is obtained that a large number of forms can be accessed in a simple manner, wherein the computer files may have different formats so that the highest possible degree of flexibility in using the invention is ensured. Just as the computer files may be in different formats which can be automatically recognized and therefore treated correctly, the forms stored in the computer files may also be in different formats which are automatically recognized and processed
correctly. In this connection, the word "automatically" is to be understood to mean that it has nothing to do with the user of the speech recognition device.
In accordance with the measures of Claims 3 and 8, the advantage is obtained that, as templates, use can also be made of those documents and working templates which have been created using software generally used in office applications, such as documents created using text processing programs, tables and diagrams created using table calculation programs, or reports drawn up using database programs. These files are automatically recognized and converted into a format which can be processed by the speech recognition device. In accordance with the measures of Claims 4 and 9, the advantage is obtained that use can also be made of those forms which the user has available only in paper form, without said user having to input these forms manually or by dictation.
In accordance with the measures of Claims 5 and 10, the advantage is obtained that forms which comprise text fields as the essential component to be processed by the user can be completed quickly and by means of simple navigation.
The invention will be further described with reference to examples of embodiments shown in the drawings to which, however, the invention is not restricted. Fig. 1 shows a speech recognition device for processing a predefined form.
Fig. 2 shows a form which is to be completed by a user. Fig. 3 shows the form of Fig. 2 after it has been completed by a user. Fig. 4 shows the form of Fig. 3, wherein a user has performed additional formatting.
Fig. 1 shows a speech recognition device 1 for processing a predefined form 2. The speech recognition device 1 may be formed by a computer which implements speech recognition software. The speech recognition device 1 comprises speech recognition means 7, storage means 8, parameter storage means 9, command storage means 10 and an adaptation stage 11. An audio signal A representing spoken text GT can be output via a microphone 5 to an A/D converter 6 which converts the audio signal A into digital audio data AD that can be fed to the speech recognition means 7. The speech recognition means 7 convert the digital audio data AD into recognized text ET that is stored in the storage means
8. For this purpose, parameter information PI that is stored in the parameter storage means 9 is taken into account, said parameter information PI comprising vocabulary information, speech model information and acoustic infonnation.
The vocabulary information comprises all words that can be recognized by the speech recognition means 7, together with phoneme sequences. The speech model information comprises statistical information regarding the sequences of words that are customary in the speech of the spoken text GT. The acoustic information comprises information about the characteristics of the accent of a user of the speech recognition device 1 and about acoustic properties of the microphone 5 and of the A/D converter 6. Speech model information and acoustic information can be configured in a user-specific manner.
The document US 5,031,113, the disclosure of which is incorporated by way of reference into the disclosure of the present document, discloses the implementation of a speech recognition method taking into account such parameter information PI, and for this reason no more details of this will be given in the present text. As a result of the speech recognition method, the speech recognition means 7 can store text data comprising the recognized text ET in the storage means 8. Furthermore, the spoken text GT can be stored in the storage means 8 in digitized form.
Sequences of words which are recognized as a command by the speech recognition means 7 are stored in the command storage stage 10. Such commands comprise, for example, the sequence of words "next word bold" in order to make the next word in the recognized text ET bold. It should be mentioned that commands can be matched in a user- specific manner, so that all users do not need to use the same strict sequence of words. Furthermore, commands can be stored in a document-specific manner, so that a sequence of command words has a fixed reference to a specific document. A predefined form 2 which is to be processed by the user of the speech recognition device 1 can be either in paper form or in the form of a computer file 3. The term "process" comprises in this connection the filling out, formatting, supplementing or deletion of elements of a form. If the form 2 is in the form of a computer file 3, then this computer file 3 can be on any desired storage medium, such as the hard disk of the computer on which the speech recognition software is implemented, on a floppy disk or on a CD-ROM. However, the computer file 3 can also be made available via a computer network, such as the Internet for example. The computer file 3 is read into analysis means 12 which are provided with computer file recognition means 13 that recognize the format of the computer file 3. On the one hand, the computer file 3 can be in a proprietary or standardized format for speech
recognition software which can be processed directly by the speech recognition means without reformatting. However, the computer file 3 can also be in one of many formats produced by software generally used in office applications, which may be text documents possibly mixed with image elements, tables, etc. Such formats are recognized by the computer file recognition means 13 and are converted by means of computer file conversion means 14 into a data format which can be processed directly by the speech recognition means 7. In general, the data format produced by the computer file conversion means 14 will be the same as the data format in which the recognized text ET is stored in the storage means 8. If the speech recognition means are configured to process a number of data formats in which the recognized text ET may be, then the recognized text ET and the data format produced by the computer file conversion means 14 may also be different from one another. The data format produced by the analysis means 12 from the template 2 is stored as a form template FV in storage means 16.
If the form 2 to be processed is in paper form, it can be converted using a scanner 4, communicating with the analysis means 12, into a computer image data format BF which is subsequently converted by character recognition means 15, contained in the analysis means 12, into the form template FV which is stored in the storage means 16.
The speech recognition means 7 combine the form template FV with the recognized text ET to form a document DK which is stored in storage means 17. This document DK can be processed by the user of the speech recognition device 1 like any other document which has been created directly using the speech recognition device 1. In particular, all the functions of advanced speech recognition software can be used on the document DK. For example, the document DK can be read into reproduction and correction means 18, to which a keyboard 19, a monitor 20 and a loudspeaker 21 are connected. The reproduction and correction means 18 are designed for the visual displaying of the form 2 on the monitor 20 and also for the acoustic reproduction of the spoken text GT and for the synchronous visual marking of the associated recognized text ET in the document DK and of the analyzed elements of the form 2 if the reproduction and correction means 18 are in an activated synchronous reproduction mode of operation. In this reproduction mode of operation, the document DK can be corrected by inputting via the keyboard and also by means of voice commands via the speech recognition device 1 simultaneously.
Hereinbelow, a simple example is given of how a user of the speech recognition device 1 can process a form 2 illustrated in Fig. 2, according to the invention.
The form 2 illustrated in Fig. 2 is the template for a radiology report which is to be completed by a radiologist using information about the patient name, clinical information and a summary. It should be mentioned that, for the purposes of the present invention, it is not necessary for the form to comprise separate text fields since the user can navigate at will in the form template FV produced by the speech recognition means 7 from the form, and can therefore perform corresponding inputs at any point in the form template. It is therefore completely sufficient, for example, if the form comprises only individual headings. However, it is of course also possible to process forms which comprise text fields, such as the text field 22 in the upper right-hand corner of the form 2, said text field being surrounded by an outline and comprising a date. By means of the measures according to the invention, it is possible for the user both to process the text field 22 (for example by inputting the date when the report was compiled) and to process all other elements of the form, that is to say the headings, or to add, delete and format any desired elements in the document created from the form. When the form 2 is to be filled out for the first time, the user can use the analysis means 12 to convert it into a form template FV. If the form 2 is in paper form, the conversion is carried out using the scanner 4 and the character recognition means 15. If the form 2 is in the form of a computer file 3, the conversion is carried out using the computer file recognition means 13 and, where appropriate, the computer file conversion means 14. The form template FV produced by the analysis means 12 is presented to the user on the monitor 20, along with the original form 2 illustrated in Fig. 2, and has for example the following data format:
<ROOT >
<TEMPLATE TEXT, bold, "Radiology"> <TEMPLATE TEXT, bold, "Report">
<TEMPLATE TEXTFIELD, line, column, date [dd.mm.yyyy] >
<TEMPLATE TEXT, newline>
<TEMPLATE TEXT, underline, "Patient">
<TEMPLATE TEXT, underline, "name"> ' <TEMPLATE TEXT, underline, ":">
<TEMPLATE TEXT, newline>
^TEMPLATE TEXT, "Clinical">
<TEMPLATE TEXT, "information">
<TEMPLATE TEXT, ":">
<TEMPLATE TEXT, newline>
<TEMPLATE TEXT, "Summary">
^TEMPLATE TEXT, ":">
This form template FV can now be filled out and processed by the user by means of dictation. For this purpose, the user dictates into the speech recognition device 1 for example the following spoken text GT:
"patient name" (in command mode)
"Henry Schmidt"
"Summary" (in command mode) "bold on" (in command mode)
"Healing fracture mid left femoral diaphysis period"
"bold off' (in command mode)
"Clinical information" (in command mode)
"The fracture fragments are near anatomic alignment. A small amount of periosteal reaction has developed period"
This spoken text GT is converted by the speech recognition means 7 into recognized text ET which is stored in the storage means 8. The recognized text ET and the form template FV are subsequently combined to form a single document DK which is stored in the storage means 17 where the reproduction and correction means 18 can access it. According to the above dictation, the document DK has the following content:
<ROOT >
<TEMPLATE TEXT, bold, "Radiology">
<TEMPLATE TEXT, bold, "Report">
<TEMPLATE TEXTFIELD, line, column, date [dd.mm.yyyy] > <TEMPLATE TEXT, newline>
<TEMPLATE TEXT, underline, "Patient">
<TEMPLATE TEXT, underline, "name">
<TEMPLATE TEXT, underline, ":">
<DICTATION, 0-2500, "Henry"> <DICTATIOΝ, 2500-3800, "Schmidt">
<TEMPLATE TEXT, newline>
<TEMPLATE TEXT, "Clinical">
<TEMPLATE TEXT, "information'^
^TEMPLATE TEXT, ":">
<DICTATION, 12200-12700, "The">
<DICTATION, 22100-23300, "developed"> <DICTATION, 23300-23800, "."> <TEMPLATE TEXT, newline> <TEMPLATE TEXT, "Summary"> <TEMPLATE TEXT, ":"> <DICTATION, 3800-4500, bold, "Healing">
<DICTATION, 10500-11800, bold, "diaphysis">
<DICTATION, 11800-12200, bold, ".">
It should be mentioned that the keyword TEMPLATE TEXT serves as an indication to the speech recognition means 7 and the reproduction and correction means 18 that the adjusted text originates from a predefined form and therefore no audio information is available for it. After the comma there may be any formatting information, which of course can also be input as a command by means of dictation. The keyword TEMPLATE TEXTFIELD indicates that it is a text field which originates from a predefined form, so that no audio information is available. The parameters line and column indicate the position of the text field in the form. The parameter date [dd.mm.yyyy] provides more detailed information about the text field. It is accordingly a date field which represents a date with two digits for the day, two digits for the month and four digits for the year, that is to say 21.02.2003. The keyword DICTATION indicates dictated text. The value after the comma indicates the audio position of the respective word (beginning and end in milliseconds relative to the start of dictation).
This results in the document illustrated in Fig. 3.
Should the radiologist be unsatisfied with the content or the formatting of the document, he can make changes thereto at will. For example, he can dictate the following commands into the speech recognition device 1 :
"Patient name" (in command mode)
"italic" (in command mode)
"Clinical information" (in command mode)
"italic underline" (in command mode)
"Summary" (in command mode)
"italic underline" (in command mode)
Once the commands have been processed by the speech recognition means 7, the result is the representation shown in Fig. 4 for the document DK. It should be understood that this is only a simple example of the possibilities that the invention provides, but that in fact all the processing possibilities of advanced speech recognition software are available to the user.
Claims
1. A speech recognition device (1) for processing a predefined form (2), wherein the speech recognition device (1) has speech recognition means (7) which can be fed a spoken text (GT) and transcribe it into a recognized text (ET), characterized by analysis means (12) for analyzing and converting the predefined form (2) into a form template (FV) having a data format which can be processed by the speech recognition means (7), wherein the form template (FV) and the recognized text (ET) can be combined by the speech recognition means (7) to form a document (DK).
2. A speech recognition device as claimed in Claim 1, characterized in that the predefined form (2) can be fed to the analysis means (12) as a computer file (3) and in that the analysis means (12) comprise a computer file recognition means (13).
3. A speech recognition device as claimed in Claim 2, characterized in that the computer file recognition means (13) comprise a computer file conversion means (14).
4. A speech recognition device as claimed in Claim 1, characterized in that the analysis means (12) comprise a scanner (4) and character recognition means (15).
5. A speech recognition device as claimed in Claim 1, characterized in that the combining of the form template (FV) and of the recognized text (ET) by the speech recognition means (7) comprises the filling out of at least one text field (22) in the form template with recognized text.
6. A speech recognition method for processing a predefined form (2), comprising the reception of spoken text (GT) and the transcribing of the spoken text into a recognized text (ET), characterized by the steps of: analyzing the predefined form (2) and converting it into a form template (FV) having a data format which corresponds to that of the recognized text (ET), and combining the form template (FV) and the recognized text (ET) to form a document (DK).
7. A speech recognition method as claimed in Claim 6, characterized in that the analyzing of the predefined form (2) comprises the reading in of the form (2) by a computer file (3) and recognitions of the file type of the computer file.
8. A speech recognition method as claimed in Claim 7, characterized in that the analyzing of the predefined form comprises the conversion of the read-in computer file (3) into a different data type.
9. A speech recognition method as claimed in Claim 6, characterized in that the analyzing of the predefined form comprises the scanning of the form and recognition of form text from the data (BF) obtained during the scanning.
10. A speech recognition method as claimed in Claim 6, characterized in that the combing of the form template (FV) and of the recognized text (ET) comprises the filling out of at least one text field (22) in the form template with recognized text.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP03100332 | 2003-02-13 | ||
EP03100332.0 | 2003-02-13 |
Publications (3)
Publication Number | Publication Date |
---|---|
WO2004072846A2 true WO2004072846A2 (en) | 2004-08-26 |
WO2004072846A3 WO2004072846A3 (en) | 2004-10-07 |
WO2004072846A8 WO2004072846A8 (en) | 2004-12-09 |
Family
ID=32865046
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IB2004/050081 WO2004072846A2 (en) | 2003-02-13 | 2004-02-05 | Automatic processing of templates with speech recognition |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2004072846A2 (en) |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017083205A1 (en) * | 2015-11-11 | 2017-05-18 | Microsoft Technology Licensing, Llc | Provide interactive content generation for document |
GB2545320A (en) * | 2015-11-05 | 2017-06-14 | Lenovo Singapore Pte Ltd | Audio input of field entries |
DK201670539A1 (en) * | 2016-03-14 | 2017-10-02 | Apple Inc | Dictation that allows editing |
US9865248B2 (en) | 2008-04-05 | 2018-01-09 | Apple Inc. | Intelligent text-to-speech conversion |
US9966060B2 (en) | 2013-06-07 | 2018-05-08 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US9971774B2 (en) | 2012-09-19 | 2018-05-15 | Apple Inc. | Voice-based media searching |
US9986419B2 (en) | 2014-09-30 | 2018-05-29 | Apple Inc. | Social reminders |
US10043516B2 (en) | 2016-09-23 | 2018-08-07 | Apple Inc. | Intelligent automated assistant |
US10049675B2 (en) | 2010-02-25 | 2018-08-14 | Apple Inc. | User profiling for voice input processing |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
US10079014B2 (en) | 2012-06-08 | 2018-09-18 | Apple Inc. | Name recognition system |
US10318871B2 (en) | 2005-09-08 | 2019-06-11 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US10356243B2 (en) | 2015-06-05 | 2019-07-16 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
CN110210014A (en) * | 2019-05-31 | 2019-09-06 | 贵州精准医疗电子有限公司 | Intelligent form system |
US10410637B2 (en) | 2017-05-12 | 2019-09-10 | Apple Inc. | User-specific acoustic models |
US10482874B2 (en) | 2017-05-15 | 2019-11-19 | Apple Inc. | Hierarchical belief states for digital assistants |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
CN111243596A (en) * | 2020-01-08 | 2020-06-05 | 中保车服科技服务股份有限公司 | Insurance information acquisition method, device and equipment based on voice recognition and storage medium |
US10706841B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Task flow identification based on user intent |
US10755703B2 (en) | 2017-05-11 | 2020-08-25 | Apple Inc. | Offline personal assistant |
US10791176B2 (en) | 2017-05-12 | 2020-09-29 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US10795541B2 (en) | 2009-06-05 | 2020-10-06 | Apple Inc. | Intelligent organization of tasks items |
US10810274B2 (en) | 2017-05-15 | 2020-10-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
US10904611B2 (en) | 2014-06-30 | 2021-01-26 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US11080012B2 (en) | 2009-06-05 | 2021-08-03 | Apple Inc. | Interface for a virtual digital assistant |
US11217255B2 (en) | 2017-05-16 | 2022-01-04 | Apple Inc. | Far-field extension for digital assistant services |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1996013009A1 (en) * | 1994-10-25 | 1996-05-02 | Ho Janet Chung Kong | System and method for generating database input forms |
US20020143533A1 (en) * | 2001-03-29 | 2002-10-03 | Mark Lucas | Method and apparatus for voice dictation and document production |
WO2002082318A2 (en) * | 2001-02-22 | 2002-10-17 | Volantia Holdings Limited | System and method for extracting information |
-
2004
- 2004-02-05 WO PCT/IB2004/050081 patent/WO2004072846A2/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1996013009A1 (en) * | 1994-10-25 | 1996-05-02 | Ho Janet Chung Kong | System and method for generating database input forms |
WO2002082318A2 (en) * | 2001-02-22 | 2002-10-17 | Volantia Holdings Limited | System and method for extracting information |
US20020143533A1 (en) * | 2001-03-29 | 2002-10-03 | Mark Lucas | Method and apparatus for voice dictation and document production |
Cited By (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10318871B2 (en) | 2005-09-08 | 2019-06-11 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US9865248B2 (en) | 2008-04-05 | 2018-01-09 | Apple Inc. | Intelligent text-to-speech conversion |
US10795541B2 (en) | 2009-06-05 | 2020-10-06 | Apple Inc. | Intelligent organization of tasks items |
US11080012B2 (en) | 2009-06-05 | 2021-08-03 | Apple Inc. | Interface for a virtual digital assistant |
US10706841B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Task flow identification based on user intent |
US11423886B2 (en) | 2010-01-18 | 2022-08-23 | Apple Inc. | Task flow identification based on user intent |
US10049675B2 (en) | 2010-02-25 | 2018-08-14 | Apple Inc. | User profiling for voice input processing |
US10079014B2 (en) | 2012-06-08 | 2018-09-18 | Apple Inc. | Name recognition system |
US9971774B2 (en) | 2012-09-19 | 2018-05-15 | Apple Inc. | Voice-based media searching |
US9966060B2 (en) | 2013-06-07 | 2018-05-08 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US10904611B2 (en) | 2014-06-30 | 2021-01-26 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US9986419B2 (en) | 2014-09-30 | 2018-05-29 | Apple Inc. | Social reminders |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US10356243B2 (en) | 2015-06-05 | 2019-07-16 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US9996517B2 (en) | 2015-11-05 | 2018-06-12 | Lenovo (Singapore) Pte. Ltd. | Audio input of field entries |
GB2545320A (en) * | 2015-11-05 | 2017-06-14 | Lenovo Singapore Pte Ltd | Audio input of field entries |
GB2545320B (en) * | 2015-11-05 | 2020-08-05 | Lenovo Singapore Pte Ltd | Audio input of field entries |
WO2017083205A1 (en) * | 2015-11-11 | 2017-05-18 | Microsoft Technology Licensing, Llc | Provide interactive content generation for document |
DK201670539A1 (en) * | 2016-03-14 | 2017-10-02 | Apple Inc | Dictation that allows editing |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
US10043516B2 (en) | 2016-09-23 | 2018-08-07 | Apple Inc. | Intelligent automated assistant |
US10553215B2 (en) | 2016-09-23 | 2020-02-04 | Apple Inc. | Intelligent automated assistant |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
US10755703B2 (en) | 2017-05-11 | 2020-08-25 | Apple Inc. | Offline personal assistant |
US10791176B2 (en) | 2017-05-12 | 2020-09-29 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US10410637B2 (en) | 2017-05-12 | 2019-09-10 | Apple Inc. | User-specific acoustic models |
US11405466B2 (en) | 2017-05-12 | 2022-08-02 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US10810274B2 (en) | 2017-05-15 | 2020-10-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
US10482874B2 (en) | 2017-05-15 | 2019-11-19 | Apple Inc. | Hierarchical belief states for digital assistants |
US11217255B2 (en) | 2017-05-16 | 2022-01-04 | Apple Inc. | Far-field extension for digital assistant services |
CN110210014A (en) * | 2019-05-31 | 2019-09-06 | 贵州精准医疗电子有限公司 | Intelligent form system |
CN110210014B (en) * | 2019-05-31 | 2023-05-30 | 贵州精准医疗电子有限公司 | Intelligent form system |
CN111243596A (en) * | 2020-01-08 | 2020-06-05 | 中保车服科技服务股份有限公司 | Insurance information acquisition method, device and equipment based on voice recognition and storage medium |
Also Published As
Publication number | Publication date |
---|---|
WO2004072846A8 (en) | 2004-12-09 |
WO2004072846A3 (en) | 2004-10-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2004072846A2 (en) | Automatic processing of templates with speech recognition | |
US11586808B2 (en) | Insertion of standard text in transcription | |
US7516070B2 (en) | Method for simultaneously creating audio-aligned final and verbatim text with the assistance of a speech recognition program as may be useful in form completion using a verbal entry method | |
US8046226B2 (en) | System and methods for reporting | |
US7979281B2 (en) | Methods and systems for creating a second generation session file | |
JP2768727B2 (en) | Report creation apparatus and method | |
DE60033106T2 (en) | Correction of operating mode errors, control or dictation, in the speech recognition | |
US8504369B1 (en) | Multi-cursor transcription editing | |
US20060190249A1 (en) | Method for comparing a transcribed text file with a previously created file | |
US6915258B2 (en) | Method and apparatus for displaying and manipulating account information using the human voice | |
US20090037171A1 (en) | Real-time voice transcription system | |
US20020095290A1 (en) | Speech recognition program mapping tool to align an audio file to verbatim text | |
EA004352B1 (en) | Automated transcription system and method using two speech converting instances and computer-assisted correction | |
Lai et al. | MedSpeak: Report creation with continuous speech recognition | |
JPH10507857A (en) | System and method for generating a database input form | |
JP2014013399A (en) | Method and system for processing dictated information | |
US7120581B2 (en) | System and method for identifying an identical audio segment using text comparison | |
US20150293902A1 (en) | Method for automated text processing and computer device for implementing said method | |
US20030097253A1 (en) | Device to edit a text in predefined windows | |
US20070067168A1 (en) | Method and device for transcribing an audio signal | |
JPH11272673A (en) | Method and processor for document processing and record medium where computer program for document processing is recorded | |
JP2004287192A (en) | Device and program for editing synthesized speech | |
WO2001093058A1 (en) | System and method for comparing text generated in association with a speech recognition program | |
Schiavon et al. | Radiological Reporting in the United States | |
JPH08314930A (en) | Proofreading method for japanese sentence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): BW GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WR | Later publication of a revised version of an international search report | ||
122 | Ep: pct application non-entry in european phase |