WO2009035842A1 - Speech-to-text transcription for personal communication devices - Google Patents
Speech-to-text transcription for personal communication devices Download PDFInfo
- Publication number
- WO2009035842A1 WO2009035842A1 PCT/US2008/074164 US2008074164W WO2009035842A1 WO 2009035842 A1 WO2009035842 A1 WO 2009035842A1 US 2008074164 W US2008074164 W US 2008074164W WO 2009035842 A1 WO2009035842 A1 WO 2009035842A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- speech
- personal communication
- text
- communication device
- speech signal
- Prior art date
Links
- 238000013518 transcription Methods 0.000 title claims abstract description 90
- 230000035897 transcription Effects 0.000 title claims abstract description 90
- 238000004891 communication Methods 0.000 title claims abstract description 74
- 238000000034 method Methods 0.000 claims description 32
- 230000005540 biological transmission Effects 0.000 claims description 22
- 230000003111 delayed effect Effects 0.000 claims description 6
- 230000008878 coupling Effects 0.000 claims description 4
- 238000010168 coupling process Methods 0.000 claims description 4
- 238000005859 coupling reaction Methods 0.000 claims description 4
- 230000000881 depressing effect Effects 0.000 claims description 3
- 238000012549 training Methods 0.000 claims description 2
- 238000012937 correction Methods 0.000 abstract description 6
- 230000015654 memory Effects 0.000 description 21
- 230000001413 cellular effect Effects 0.000 description 20
- 238000012545 processing Methods 0.000 description 13
- 239000000463 material Substances 0.000 description 10
- 230000003287 optical effect Effects 0.000 description 9
- 230000008569 process Effects 0.000 description 7
- 230000006870 function Effects 0.000 description 4
- 230000005055 memory storage Effects 0.000 description 3
- 230000006855 networking Effects 0.000 description 3
- 230000003213 activating effect Effects 0.000 description 2
- 230000004913 activation Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 239000000872 buffer Substances 0.000 description 2
- 238000004040 coloring Methods 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 238000007792 addition Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000010267 cellular communication Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000009429 electrical wiring Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012806 monitoring device Methods 0.000 description 1
- 238000007639 printing Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 239000012536 storage buffer Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000007723 transport mechanism Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
Definitions
- the technical field relates generally to personal communication devices and specifically relates to speech-to-text transcription by server resources on behalf of personal communication devices.
- PDAs personal digital assistants
- the keypad of a cellular phone typically contains several keys that are multifunctional keys. Specifically, a single key is used to enter one of three alphabets, such as A, B, or C.
- the keypad of a personal digital assistant (PDA) provides some improvement by incorporating a QWERTY keyboard wherein individual keys are used for individual alphabets. Nonetheless, the miniature size of the keys proves to be inconvenient to some users and a severe handicap to others.
- a speech signal is created by speaking a portion of an e-mail, for example, into a personal communications device (PCD).
- the generated speech signal is transmitted to a server.
- the server houses a speech-to-text transcription system, which transcribes the speech signal into a text message that is returned to the PCD.
- the text message is edited on the PCD for correcting any transcription errors and then used in various applications.
- the edited text is transmitted in an e-mail format to an e-mail recipient.
- a speech signal generated by a PCD is received in a server.
- the speech signal is transcribed into a text message by using a speech-to-text transcription system located in the server.
- the text message is then transmitted to the PCD.
- the transcription process includes generating a list of alternative candidates for speech recognition of a spoken word. This list of alternative candidates is transmitted together with a transcribed word, by the server to the PCD.
- Figure 1 shows an exemplary communication system 100 incorporating a speech-to-text transcription system for personal communication devices.
- Figure 2 shows an exemplary sequence of steps for generating text using speech-to-text transcription, the method being implemented on the communication system of Figure 1.
- Figure 3 is a diagram of an exemplary processor for implementing speech- to-text transcription for personal communication devices.
- Figure 4 is a depiction of a suitable computing environment in which speech- to-text transcription for personal communication devices may be implemented.
- a speech-to-text transcription system for personal communication devices is housed in a communications server that is communicatively coupled to one or more mobile devices.
- the speech-to-text transcription system located in the server is feature-rich and efficient because of the availability of extensive, cost- effective storage capacity and computing power in the server.
- a user of the mobile device which is referred to herein as a personal communications device (PCD) dictates the audio of, for example an e-mail, into the PCD.
- the PCD converts the user's voice into a speech signal that is transmitted to the speech-to-text transcription system located in the server.
- the speech- to-text transcription system transcribes the speech signal into a text message by using speech recognition techniques.
- the text message is then transmitted by the server to the PCD.
- the user Upon receiving the text message, the user carries out corrections on erroneously transcribed words before using the text message in various applications that utilize text.
- the edited text message is used to form, for example, the body part of an e-mail that is then sent to an e-mail recipient.
- the edited text message is used in a utility such as Microsoft WORD TM.
- the edited text is inserted into a memo.
- the speech-to-text transcription system located in the server incorporates a cost effective speech recognition system that provides high word recognition accuracy, typically in the mid- to-high 90 % range, in comparison to a more limited speech recognition system housed inside a PCD.
- a cost effective speech recognition system that provides high word recognition accuracy, typically in the mid- to-high 90 % range, in comparison to a more limited speech recognition system housed inside a PCD.
- using the keypad of the PCD for editing a few incorrect words in a text message generated by speech-to-text transcription is more efficient and preferable to entering the entire text of an e-mail message by manually depressing keys on the keypad of the PCD.
- the number of incorrect words would typically be fewer than 10% of the total number of words in the transcribed text message.
- Figure 1 shows an exemplary communication system 100 incorporating a speech-to-text transcription system 130 housed in a server 125 located in cellular base station 120.
- Cellular base station 120 provides cellular communication services to various PCDs, as is known in the art.
- PCDs are communicatively coupled to server 125, either on an as-needed basis or on a continuous basis, for purposes of accessing speech-to-text transcription system 130.
- PCDs include PCD 105, which is a smartphone; PCD 110, which is a personal digital assistant (PDA); and PCD 115, which is a cellular phone having text entry facility.
- PCD 105 the smartphone, combines a cellular phone with a computer thereby providing voice as well as data communication features including e- mail.
- PCD 110 the PDA, combines a computer for data communication, a cellular phone for voice communication, and a database for storing personal information such as addresses, appointments, calendar, and memos.
- PCD 115 the cellular phone, provides voice communication as well as certain text entry facilities such as short message service (SMS).
- SMS short message service
- cellular base station 120 in addition to housing speech-to-text transcription system 130, cellular base station 120 further includes an e-mail server 145 that provides e-mail services to the various PCDs.
- Cellular base station 120 also is communicatively coupled to other network elements such as Public Switched Telephone Network Central Office (PSTN CO) 140 and, optionally, to an Internet Service Provider (ISP) 150.
- PSTN CO Public Switched Telephone Network Central Office
- ISP Internet Service Provider
- the ISP 150 is coupled to an enterprise 152 comprising an email server 162 and the speech-to-text transcription system 130 for handling email and transcription functions.
- Speech-to-text transcription system 130 may be housed in several alternative locations in communication network 100.
- speech-to-text transcription system 130 is housed in a secondary server 135 located in cellular base station 120. Secondary server 135 is communicatively coupled to server 125, which operates as a primary server in this configuration.
- speech-to-text transcription system 130 is housed in a server 155 located in PSTN CO 140.
- speech-to-text transcription system 130 is housed in a server 160 located in a facility of ISP 150.
- speech-to-text transcription system 130 includes a speech recognition system.
- the speech recognition system may be a speaker- independent system or a speaker-dependent system.
- speech-to-text transcription system 130 includes a training feature where a PCD user is prompted to speak several words, either in the form of individual words or in the form of a specified paragraph. These words are stored as a customized template of words for use by this PCD user.
- speech-to-text transcription system 130 may also incorporate, in the form of one or more databases associated with each individual PCD user, one or more of the following: a customized list of vocabulary words that are preferred and generally spoken by the user, a list of e-mail addresses used by the user, and a contact list having personal information of one or more contacts of the user.
- Step 1 a PCD user dictates an e-mail into PCD 105.
- the dictated audio may be one of several alternative materials pertaining to an e-mail. A few non-exhaustive examples of such materials include: a portion of the body of an e-mail, the entire body of an e-mail, a subject line text, and one or more e-mail addresses.
- the dictated audio is converted into an electronic speech signal in PCD 105, encoded suitably for wireless transmission, and then transmitted to cellular base station 120, where it is routed to speech-to-text transcription system 130.
- Speech-to-text transcription system 130 which typically includes a speech recognition system (not shown) and a text generator (not shown), transcribes the speech signal into text data.
- the text data is encoded suitably for wireless transmission and transmitted, in Step 2, back to PCD 105.
- Step 2 may be implemented in an automatic process, where the text message is automatically sent to PCD 105 without any action being carried out by a user of PCD 105.
- the PCD user has to manually operate PCD 105, by activating certain keys for example, for downloading the text message from speech- to-text transcription system 130 into PCD 105.
- the text message is not transmitted to PCD 105 until this download request has been made by the PCD user.
- the PCD user enunciates material that is desired to be transcribed from speech to text.
- the enunciated text is stored in a suitable storage buffer in the PCD. This may be carried out, for example, by using an analog-to-digital encoder for digitizing the speaker's voice, followed by storing of the digitized data in a digital memory chip. The digitization and storage process is carried out until the PCD user has finished enunciating the entire material.
- the PCD user activates a "transcribe" key on the PCD for transmitting the digitized data in the form of a data signal to cellular base station 120, after suitable formatting for wireless transmission.
- the transcribe key may be implemented as a hard key or a soft key, the soft key being displayed for example, in the form of an icon on a display of the PCD.
- the PCD user enunciates material that is transmitted frequently and periodically in data form from PCD 105 to cellular base station 120.
- the enunciated material may be transmitted as a portion of a speech signal whenever the PCD user pauses during his speaking into the PCD. Such a pause may occur at the end of a sentence for example.
- the speech-to-text transcription system 130 may transcribe this particular portion of the speech signal and return the corresponding text message even as the PCD user is speaking the next sentence. Consequently, the transcription process can be carried out faster in this piecemeal transmission mode than in the delayed transmission mode where the user has to completely finish speaking the entire material.
- the piecemeal transmission mode may be selectively combined with the delayed transmission mode.
- a temporary buffer storage is used to store certain portions (larger than a sentence for example) of the enunciated material before intermittent transmission out of PCD 105.
- the buffer storage required for such an implementation may be more modest in comparison with that for a delayed transmission mode where the entire material has to be stored before transmission.
- the PCD user activates a "transcription request" key on the PCD.
- the transcription request key may be implemented as a hard key or a soft key, the soft key being displayed for example, in the form of an icon on a display of the PCD.
- IP Internet Protocol
- TCP/IP Transport Control Format
- a telephone call such as a circuit-switched call (e.g., a standard telephony call) is provided to the server 125 via the cellular base station 120.
- the packet transmission link is used by server 105 to acknowledge to PCD 105 a readiness of the server 125 to receive IP data packets from PCD 105.
- the IP data packets carrying digital data digitized from material enunciated by the user, are received in server 125 and suitably decoded before being coupled into speech-to-text transcription system 130 for transcription.
- the transcribed text message may be propagated to the PCD in either a delayed transmission mode or a piecemeal transmission mode, again in the form of IP data packets.
- speech-to-text transcription is typically carried out in speech-to-text transcription system 130 by using a speech recognition system.
- the speech recognition system recognizes individual words by delegating a confidence factor for each of several alternative candidates for speech recognition, when such alternative candidates are present. For example, a spoken word “taut” may have several alternative candidates for speech recognition such as "taught,” “thought,” “tote,” and “taut.”
- the speech recognition system associates each of these alternative candidates with a confidence factor for recognition accuracy.
- the confidence factors for taught, thought, tote and taut may be 75%, 50%, 25%, and 10% respectively.
- the speech recognition system selects the candidate having the highest confidence factor and uses this candidate for transcribing the spoken word into text. Consequently, in this example, speech-to-text transcription system 130 transcribes the spoken word "taut" into the textual word "taught.”
- This transcribed word which is transmitted as part of the transcribed text from cellular base station 105 to PCD 105 in Step 2 of Figure 2, is obviously incorrect.
- the PCD user observes this erroneous word on his PCD 105 and manually edits the word by deleting "taught” and replacing it with "taut", which in this instance is carried out by typing the word "taut” on a keyboard of PCD 105.
- one or more of the alternative candidate words are linked to the transcribed word "taught" by speech-to-text transcription system 130.
- the PCD user observes the erroneous word and selects an alternative candidate word from a menu rather than manually typing in a replacement word.
- the menu may be displayed as a drop-down menu for example, by placing a cursor upon the incorrectly transcribed word "taught".
- the alternative words may be automatically displayed when the cursor is placed upon a transcribed word, or may be displayed by activating an appropriate hardkey or softkey of PCD 105 after placing the cursor on the incorrectly transcribed word.
- alternative sequences of words (phrases) can be automatically displayed, and the user can chose the appropriate phrase.
- the phrases “Rob taught”, “rope taught”, “Rob taut”, and “rope taut” can be displayed, and the user can select the appropriate phrase.
- appropriate phrases can be automatically displayed or withheld from display in accordance with confidence level.
- the system might have a low confidence, based on general patterns of English usage, that the phrases "Rob taut” and “rope taught” are correct, and could withhold those phrases from being displayed.
- the system can learn from previous selections. For example, the system could learn dictionary words, dictionary phrases, contact names, phone numbers, or the like. Additionally, the text could be predicted based upon previous behavior.
- speech-to-text transcription system 130 may provide word correction facilities by highlighting questionably transcribed words in certain ways, for example, by underlining the questionable word by a red line, or by coloring the text of the questionable word in red.
- the PCD can provide word correction facilities by highlighting questionably transcribed words in certain ways, for example, by underlining the questionable word by a red line, or by coloring the text of the questionable word in red.
- the correction process described above may be further used to generate a customized list of vocabulary words or for creating a dictionary of customized words.
- Either or both the customized list and the dictionary may be stored in either or both of speech-to-text transcription system 130 and PCD 105.
- the customized list of vocabulary words may be used to store certain words that are unique to a particular user. For example, such words may include a person's name or a word in a foreign language.
- the customized dictionary may be created for example, when the PCD user indicates that a certain transcribed word must be automatically corrected in future by a replacement word provided by the PCD user.
- FIG. 3 is a diagram of an exemplary processor 300 for implementing speech-to-text transcription 130.
- the processor 300 comprises a processing portion 305, a memory portion 350, and an input/output portion 360.
- the processing portion 305, memory portion 350, and input/output portion 360 are coupled together (coupling not shown in Figure 3) to allow communications therebetween.
- the input/output portion 360 is capable of providing and/or receiving components utilized to perform speech-to-text transcription as described above.
- the input/output portion 360 is capable of providing communicative coupling between a cellular base station and speech-to-text transcription 130 and/or communicative coupling between a server and speech-to-text transcription 130.
- the processor 300 can be implemented as a client processor, a server processor, and/or a distributed processor.
- the processor 300 can include at least one processing portion 305 and memory portion 350.
- the memory portion 350 can store any information utilized in conjunction with speech-to-text transcription.
- the memory portion 350 can be volatile (such as RAM) 325, non-volatile (such as ROM, flash memory, etc.) 330, or a combination thereof.
- the processor 300 can have additional features/functionality.
- the processor 300 can include additional storage (removable storage 310 and/or non-removable storage 320) including, but not limited to, magnetic or optical disks, tape, flash, smart cards or a combination thereof.
- Computer storage media such as memory portion 310, 320, 325, and 330, include volatile and nonvolatile, removable and nonremovable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules, or other data.
- Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, universal serial bus (USB) compatible memory, smart cards, or any other medium which can be used to store the desired information and which can be accessed by the processor 300. Any such computer storage media can be part of the processor 300.
- a computer system can be roughly divided into three component groups: the hardware component, the hardware/software interface system component, and the applications programs component (also referred to as the "user component” or “software component”).
- the hardware component may comprise the central processing unit (CPU) 421, the memory (both ROM 464 and RAM 425), the basic input/output system (BIOS) 466, and various input/output (I/O) devices such as a keyboard 440, a mouse 442, a monitor 447, and/or a printer (not shown), among other things.
- the hardware component comprises the basic physical infrastructure for the computer system.
- the applications programs component comprises various software programs including but not limited to compilers, database systems, word processors, business programs, videogames, and so forth.
- Application programs provide the means by which computer resources are utilized to solve problems, provide solutions, and process data for various users (machines, other computer systems, and/or end-users).
- application programs perform the functions associated with speech-to-text transcription for personal communication devices as described above.
- a hardware/software interface system shell (referred to as a "shell") is an interactive end-user interface to a hardware/software interface system.
- a shell may also be referred to as a "command interpreter” or, in an operating system, as an “operating system shell”).
- a shell is the outer layer of a hardware/software interface system that is directly accessible by application programs and/or end-users.
- a kernel is a hardware/software interface system's innermost layer that interacts directly with the hardware components.
- an exemplary general purpose computing system includes a conventional computing device 460 or the like, including a central processing unit 421, a system memory 462, and a system bus 423 that couples various system components including the system memory to the processing unit 421.
- the system bus 423 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures.
- the system memory includes read only memory (ROM) 464 and random access memory (RAM) 425.
- ROM read only memory
- RAM random access memory
- a basic input/output system 466 (BIOS) containing basic routines that help to transfer information between elements within the computing device 460, such as during start up, is stored in ROM 464.
- the drives and their associated computer readable media provide non volatile storage of computer readable instructions, data structures, program modules and other data for the computing device 460.
- the exemplary environment described herein employs a hard disk, a removable magnetic disk 429, and a removable optical disk 431, it should be appreciated by those skilled in the art that other types of computer readable media which can store data that is accessible by a computer, such as magnetic cassettes, flash memory cards, digital video disks, Bernoulli cartridges, random access memories (RAMs), read only memories (ROMs), and the like may also be used in the exemplary operating environment.
- the exemplary environment may also include many types of monitoring devices such as heat sensors and security or fire alarm systems, and other sources of information.
- a number of program modules can be stored on the hard disk 427, magnetic disk 429, optical disk 431, ROM 464, or RAM 425, including an operating system 435, one or more application programs 436, other program modules 437, and program data 438.
- a user may enter commands and information into the computing device 460 through input devices such as a keyboard 440 and pointing device 442 (e.g., mouse).
- Other input devices may include a microphone, joystick, game pad, satellite disk, scanner, or the like.
- serial port interface 446 that is coupled to the system bus, but may be connected by other interfaces, such as a parallel port, game port, or universal serial bus (USB).
- the computing device 460 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 449.
- the remote computer 449 may be another computing device (e.g., personal computer), a server, a router, a network PC, a peer device, or other common network node, and typically includes many or all of the elements described above relative to the computing device 460, although only a memory storage device 450 (floppy drive) has been illustrated in Figure 4.
- the logical connections depicted in Figure 4 include a local area network (LAN) 451 and a wide area network (WAN) 452.
- LAN local area network
- WAN wide area network
- Such networking environments are commonplace in offices, enterprise wide computer networks, intranets and the Internet.
- the computing device 460 When used in a LAN networking environment, the computing device 460 is connected to the LAN 451 through a network interface or adapter 453. When used in a WAN networking environment, the computing device 460 can include a modem 454 or other means for establishing communications over the wide area network 452, such as the Internet.
- the modem 454, which may be internal or external, is connected to the system bus 423 via the serial port interface 446.
- program modules depicted relative to the computing device 460, or portions thereof may be stored in the remote memory storage device. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used.
- speech-to-text transcription for personal communication devices are particularly well-suited for computerized systems
- nothing in this document is intended to limit speech-to-text transcription for personal communication devices to such embodiments.
- the term "computer system” is intended to encompass any and all devices capable of storing and processing information and/or capable of using the stored information to control the behavior or execution of the device itself, regardless of whether such devices are electronic, mechanical, logical, or virtual in nature.
- the various techniques described herein can be implemented in connection with hardware or software or, where appropriate, with a combination of both.
- the methods and apparatuses for speech-to-text transcription for personal communication devices can take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine -readable storage medium, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for implementing speech-to-text transcription for personal communication devices.
- the program(s) can be implemented in assembly or machine language, if desired.
- the language can be a compiled or interpreted language, and combined with hardware implementations.
- the methods and apparatuses for implementing speech-to- text transcription for personal communication devices also can be practiced via communications embodied in the form of program code that is transmitted over some transmission medium, such as over electrical wiring or cabling, through fiber optics, or via any other form of transmission, wherein, when the program code is received and loaded into and executed by a machine, such as an EPROM, a gate array, a programmable logic device (PLD), a client computer, or the like.
- a machine such as an EPROM, a gate array, a programmable logic device (PLD), a client computer, or the like.
- speech-to-text transcription for personal communication devices has been described in connection with the example embodiments of the various figures, it is to be understood that other similar embodiments can be used or modifications and additions can be made to the described embodiments for performing the same functions of speech-to-text transcription for personal communication devices without deviating therefrom. Therefore, speech-to-text transcription for personal communication devices as described herein should not be limited to any single embodiment, but rather should be construed in breadth and scope in accordance with the appended claims.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Telephonic Communication Services (AREA)
- Information Transfer Between Computers (AREA)
- Telephone Function (AREA)
Abstract
Description
Claims
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2010524907A JP2011504304A (en) | 2007-09-12 | 2008-08-25 | Speech to text transcription for personal communication devices |
BRPI0814418-4A2A BRPI0814418A2 (en) | 2007-09-12 | 2008-08-25 | SPEECH-TO-TEXT TRANSCRIPTION FOR PERSONAL COMMUNICATION DEVICES |
CN200880107047A CN101803214A (en) | 2007-09-12 | 2008-08-25 | The speech-to-text transcription that is used for the personal communication devices |
EP08798590A EP2198527A4 (en) | 2007-09-12 | 2008-08-25 | Speech-to-text transcription for personal communication devices |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/854,523 | 2007-09-12 | ||
US11/854,523 US20090070109A1 (en) | 2007-09-12 | 2007-09-12 | Speech-to-Text Transcription for Personal Communication Devices |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2009035842A1 true WO2009035842A1 (en) | 2009-03-19 |
Family
ID=40432828
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2008/074164 WO2009035842A1 (en) | 2007-09-12 | 2008-08-25 | Speech-to-text transcription for personal communication devices |
Country Status (8)
Country | Link |
---|---|
US (1) | US20090070109A1 (en) |
EP (1) | EP2198527A4 (en) |
JP (1) | JP2011504304A (en) |
KR (1) | KR20100065317A (en) |
CN (1) | CN101803214A (en) |
BR (1) | BRPI0814418A2 (en) |
RU (1) | RU2010109071A (en) |
WO (1) | WO2009035842A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE102008061741A1 (en) * | 2008-09-09 | 2010-03-11 | Avaya Inc. | Sharing electromagnetic signal measurements to provide feedback on the signal quality of the transmission path |
WO2010129714A3 (en) * | 2009-05-05 | 2011-02-24 | NoteVault, Inc. | System and method for multilingual transcription service with automated notification services |
JP2014505270A (en) * | 2010-12-16 | 2014-02-27 | ネイバー コーポレーション | Speech recognition client system, speech recognition server system and speech recognition method for processing online speech recognition |
Families Citing this family (170)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8645137B2 (en) | 2000-03-16 | 2014-02-04 | Apple Inc. | Fast, language-independent method for user authentication by voice |
US20170169700A9 (en) * | 2005-09-01 | 2017-06-15 | Simplexgrinnell Lp | System and method for emergency message preview and transmission |
US8677377B2 (en) | 2005-09-08 | 2014-03-18 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
WO2007121441A2 (en) | 2006-04-17 | 2007-10-25 | Vovision Llc | Methods and systems for correcting transcribed audio files |
US9318108B2 (en) | 2010-01-18 | 2016-04-19 | Apple Inc. | Intelligent automated assistant |
US8073681B2 (en) | 2006-10-16 | 2011-12-06 | Voicebox Technologies, Inc. | System and method for a cooperative conversational voice user interface |
US7818176B2 (en) | 2007-02-06 | 2010-10-19 | Voicebox Technologies, Inc. | System and method for selecting and presenting advertisements based on natural language processing of voice-based input |
US20090234635A1 (en) * | 2007-06-29 | 2009-09-17 | Vipul Bhatt | Voice Entry Controller operative with one or more Translation Resources |
US20110022387A1 (en) * | 2007-12-04 | 2011-01-27 | Hager Paul M | Correcting transcribed audio files with an email-client interface |
US8140335B2 (en) | 2007-12-11 | 2012-03-20 | Voicebox Technologies, Inc. | System and method for providing a natural language voice user interface in an integrated voice navigation services environment |
US10002189B2 (en) | 2007-12-20 | 2018-06-19 | Apple Inc. | Method and apparatus for searching using an active ontology |
US9330720B2 (en) | 2008-01-03 | 2016-05-03 | Apple Inc. | Methods and apparatus for altering audio output signals |
US8996376B2 (en) | 2008-04-05 | 2015-03-31 | Apple Inc. | Intelligent text-to-speech conversion |
US8856003B2 (en) * | 2008-04-30 | 2014-10-07 | Motorola Solutions, Inc. | Method for dual channel monitoring on a radio device |
US9305548B2 (en) | 2008-05-27 | 2016-04-05 | Voicebox Technologies Corporation | System and method for an integrated, multi-modal, multi-device natural language voice services environment |
US20100030549A1 (en) | 2008-07-31 | 2010-02-04 | Lee Michael M | Mobile device having human language translation capability with positional feedback |
US8676904B2 (en) | 2008-10-02 | 2014-03-18 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US8326637B2 (en) | 2009-02-20 | 2012-12-04 | Voicebox Technologies, Inc. | System and method for processing multi-modal device interactions in a natural language voice services environment |
US10241752B2 (en) | 2011-09-30 | 2019-03-26 | Apple Inc. | Interface for a virtual digital assistant |
US10241644B2 (en) | 2011-06-03 | 2019-03-26 | Apple Inc. | Actionable reminder entries |
US9431006B2 (en) | 2009-07-02 | 2016-08-30 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US9171541B2 (en) | 2009-11-10 | 2015-10-27 | Voicebox Technologies Corporation | System and method for hybrid processing in a natural language voice services environment |
US10276170B2 (en) | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
US8682667B2 (en) | 2010-02-25 | 2014-03-25 | Apple Inc. | User profiling for selecting user specific voice input processing information |
US8224654B1 (en) | 2010-08-06 | 2012-07-17 | Google Inc. | Editing voice input |
CN102541505A (en) * | 2011-01-04 | 2012-07-04 | 中国移动通信集团公司 | Voice input method and system thereof |
KR101795574B1 (en) | 2011-01-06 | 2017-11-13 | 삼성전자주식회사 | Electronic device controled by a motion, and control method thereof |
KR101858531B1 (en) | 2011-01-06 | 2018-05-17 | 삼성전자주식회사 | Display apparatus controled by a motion, and motion control method thereof |
US8489398B1 (en) * | 2011-01-14 | 2013-07-16 | Google Inc. | Disambiguation of spoken proper names |
US9037459B2 (en) * | 2011-03-14 | 2015-05-19 | Apple Inc. | Selection of text prediction results by an accessory |
AU2014200860B2 (en) * | 2011-03-14 | 2016-05-26 | Apple Inc. | Selection of text prediction results by an accessory |
US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
US10057736B2 (en) | 2011-06-03 | 2018-08-21 | Apple Inc. | Active transport based notifications |
US8417233B2 (en) | 2011-06-13 | 2013-04-09 | Mercury Mobile, Llc | Automated notation techniques implemented via mobile devices and/or computer networks |
KR101457116B1 (en) * | 2011-11-07 | 2014-11-04 | 삼성전자주식회사 | Electronic apparatus and Method for controlling electronic apparatus using voice recognition and motion recognition |
US10134385B2 (en) | 2012-03-02 | 2018-11-20 | Apple Inc. | Systems and methods for name pronunciation |
US9280610B2 (en) | 2012-05-14 | 2016-03-08 | Apple Inc. | Crowd sourcing information to fulfill user requests |
US10417037B2 (en) | 2012-05-15 | 2019-09-17 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US9721563B2 (en) | 2012-06-08 | 2017-08-01 | Apple Inc. | Name recognition system |
US9547647B2 (en) | 2012-09-19 | 2017-01-17 | Apple Inc. | Voice-based media searching |
JP5887253B2 (en) * | 2012-11-16 | 2016-03-16 | 本田技研工業株式会社 | Message processing device |
DE212014000045U1 (en) | 2013-02-07 | 2015-09-24 | Apple Inc. | Voice trigger for a digital assistant |
US20140229180A1 (en) * | 2013-02-13 | 2014-08-14 | Help With Listening | Methodology of improving the understanding of spoken words |
WO2014144579A1 (en) * | 2013-03-15 | 2014-09-18 | Apple Inc. | System and method for updating an adaptive speech recognition model |
US9582608B2 (en) | 2013-06-07 | 2017-02-28 | Apple Inc. | Unified ranking with entropy-weighted information for phrase-based semantic auto-completion |
WO2014197334A2 (en) | 2013-06-07 | 2014-12-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
WO2014197335A1 (en) | 2013-06-08 | 2014-12-11 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
DE112014002747T5 (en) | 2013-06-09 | 2016-03-03 | Apple Inc. | Apparatus, method and graphical user interface for enabling conversation persistence over two or more instances of a digital assistant |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
US9305551B1 (en) * | 2013-08-06 | 2016-04-05 | Timothy A. Johns | Scribe system for transmitting an audio recording from a recording device to a server |
KR20150024188A (en) * | 2013-08-26 | 2015-03-06 | 삼성전자주식회사 | A method for modifiying text data corresponding to voice data and an electronic device therefor |
US20150081294A1 (en) * | 2013-09-19 | 2015-03-19 | Maluuba Inc. | Speech recognition for user specific language |
US10296160B2 (en) | 2013-12-06 | 2019-05-21 | Apple Inc. | Method for extracting salient dialog usage from live data |
CN104735634B (en) * | 2013-12-24 | 2019-06-25 | 腾讯科技(深圳)有限公司 | A kind of association payment accounts management method, mobile terminal, server and system |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US9633004B2 (en) | 2014-05-30 | 2017-04-25 | Apple Inc. | Better resolution when referencing to concepts |
US9430463B2 (en) | 2014-05-30 | 2016-08-30 | Apple Inc. | Exemplar-based natural language processing |
EP3149728B1 (en) | 2014-05-30 | 2019-01-16 | Apple Inc. | Multi-command single utterance input method |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
KR102357321B1 (en) * | 2014-08-27 | 2022-02-03 | 삼성전자주식회사 | Apparatus and method for recognizing voiceof speech |
CN105374356B (en) * | 2014-08-29 | 2019-07-30 | 株式会社理光 | Audio recognition method, speech assessment method, speech recognition system and speech assessment system |
US9818400B2 (en) | 2014-09-11 | 2017-11-14 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
CN107003996A (en) | 2014-09-16 | 2017-08-01 | 声钰科技 | VCommerce |
US9898459B2 (en) | 2014-09-16 | 2018-02-20 | Voicebox Technologies Corporation | Integration of domain information into state transitions of a finite state transducer for natural language processing |
US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
WO2016061309A1 (en) | 2014-10-15 | 2016-04-21 | Voicebox Technologies Corporation | System and method for providing follow-up responses to prior natural language inputs of a user |
CA2869245A1 (en) | 2014-10-27 | 2016-04-27 | MYLE Electronics Corp. | Mobile thought catcher system |
US10614799B2 (en) | 2014-11-26 | 2020-04-07 | Voicebox Technologies Corporation | System and method of providing intent predictions for an utterance prior to a system detection of an end of the utterance |
US10431214B2 (en) | 2014-11-26 | 2019-10-01 | Voicebox Technologies Corporation | System and method of determining a domain and/or an action related to a natural language input |
US10152299B2 (en) | 2015-03-06 | 2018-12-11 | Apple Inc. | Reducing response latency of intelligent automated assistants |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US10460227B2 (en) | 2015-05-15 | 2019-10-29 | Apple Inc. | Virtual assistant in a communication session |
US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
US9578173B2 (en) | 2015-06-05 | 2017-02-21 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
US20160378747A1 (en) | 2015-06-29 | 2016-12-29 | Apple Inc. | Virtual assistant for media playback |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US10366158B2 (en) | 2015-09-29 | 2019-07-30 | Apple Inc. | Efficient word encoding for recurrent neural network language models |
US11010550B2 (en) | 2015-09-29 | 2021-05-18 | Apple Inc. | Unified language modeling framework for word prediction, auto-completion and auto-correction |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
CN108431889A (en) * | 2015-11-17 | 2018-08-21 | 优步格拉佩股份有限公司 | Asynchronous speech act detection in text-based message |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10446143B2 (en) | 2016-03-14 | 2019-10-15 | Apple Inc. | Identification of voice inputs providing credentials |
CN105869654B (en) | 2016-03-29 | 2020-12-04 | 阿里巴巴集团控股有限公司 | Audio message processing method and device |
US9934775B2 (en) | 2016-05-26 | 2018-04-03 | Apple Inc. | Unit-selection text-to-speech synthesis based on predicted concatenation parameters |
US9972304B2 (en) | 2016-06-03 | 2018-05-15 | Apple Inc. | Privacy preserving distributed evaluation framework for embedded personalized systems |
US11227589B2 (en) | 2016-06-06 | 2022-01-18 | Apple Inc. | Intelligent list reading |
US10249300B2 (en) | 2016-06-06 | 2019-04-02 | Apple Inc. | Intelligent list reading |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
DK179588B1 (en) | 2016-06-09 | 2019-02-22 | Apple Inc. | Intelligent automated assistant in a home environment |
US10192552B2 (en) | 2016-06-10 | 2019-01-29 | Apple Inc. | Digital assistant providing whispered speech |
US10490187B2 (en) | 2016-06-10 | 2019-11-26 | Apple Inc. | Digital assistant providing automated status report |
US10586535B2 (en) | 2016-06-10 | 2020-03-10 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
US10509862B2 (en) | 2016-06-10 | 2019-12-17 | Apple Inc. | Dynamic phrase expansion of language input |
DK201670540A1 (en) | 2016-06-11 | 2018-01-08 | Apple Inc | Application integration with a digital assistant |
DK179415B1 (en) | 2016-06-11 | 2018-06-14 | Apple Inc | Intelligent device arbitration and control |
DK179049B1 (en) | 2016-06-11 | 2017-09-18 | Apple Inc | Data driven natural language event detection and classification |
DK179343B1 (en) | 2016-06-11 | 2018-05-14 | Apple Inc | Intelligent task discovery |
WO2018023106A1 (en) | 2016-07-29 | 2018-02-01 | Erik SWART | System and method of disambiguating natural language processing requests |
US10474753B2 (en) | 2016-09-07 | 2019-11-12 | Apple Inc. | Language identification using recurrent neural networks |
US10043516B2 (en) | 2016-09-23 | 2018-08-07 | Apple Inc. | Intelligent automated assistant |
US20180143956A1 (en) * | 2016-11-18 | 2018-05-24 | Microsoft Technology Licensing, Llc | Real-time caption correction by audience |
US11281993B2 (en) | 2016-12-05 | 2022-03-22 | Apple Inc. | Model and ensemble compression for metric learning |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
US11204787B2 (en) | 2017-01-09 | 2021-12-21 | Apple Inc. | Application integration with a digital assistant |
DK201770383A1 (en) | 2017-05-09 | 2018-12-14 | Apple Inc. | User interface for correcting recognition errors |
US10417266B2 (en) | 2017-05-09 | 2019-09-17 | Apple Inc. | Context-aware ranking of intelligent response suggestions |
DK201770439A1 (en) | 2017-05-11 | 2018-12-13 | Apple Inc. | Offline personal assistant |
US10395654B2 (en) | 2017-05-11 | 2019-08-27 | Apple Inc. | Text normalization based on a data-driven learning network |
US10726832B2 (en) | 2017-05-11 | 2020-07-28 | Apple Inc. | Maintaining privacy of personal information |
DK179745B1 (en) | 2017-05-12 | 2019-05-01 | Apple Inc. | SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT |
DK201770427A1 (en) | 2017-05-12 | 2018-12-20 | Apple Inc. | Low-latency intelligent automated assistant |
US11301477B2 (en) | 2017-05-12 | 2022-04-12 | Apple Inc. | Feedback analysis of a digital assistant |
DK179496B1 (en) | 2017-05-12 | 2019-01-15 | Apple Inc. | USER-SPECIFIC Acoustic Models |
DK201770431A1 (en) | 2017-05-15 | 2018-12-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
DK201770432A1 (en) | 2017-05-15 | 2018-12-21 | Apple Inc. | Hierarchical belief states for digital assistants |
DK179549B1 (en) | 2017-05-16 | 2019-02-12 | Apple Inc. | Far-field extension for digital assistant services |
US10303715B2 (en) | 2017-05-16 | 2019-05-28 | Apple Inc. | Intelligent automated assistant for media exploration |
US10311144B2 (en) | 2017-05-16 | 2019-06-04 | Apple Inc. | Emoji word sense disambiguation |
US10403278B2 (en) | 2017-05-16 | 2019-09-03 | Apple Inc. | Methods and systems for phonetic matching in digital assistant services |
US10657328B2 (en) | 2017-06-02 | 2020-05-19 | Apple Inc. | Multi-task recurrent neural network architecture for efficient morphology handling in neural language modeling |
CN109213971A (en) * | 2017-06-30 | 2019-01-15 | 北京国双科技有限公司 | The generation method and device of court's trial notes |
US10445429B2 (en) | 2017-09-21 | 2019-10-15 | Apple Inc. | Natural language understanding using vocabularies with compressed serialized tries |
US10755051B2 (en) | 2017-09-29 | 2020-08-25 | Apple Inc. | Rule-based natural language processing |
US10636424B2 (en) | 2017-11-30 | 2020-04-28 | Apple Inc. | Multi-turn canned dialog |
US10733982B2 (en) | 2018-01-08 | 2020-08-04 | Apple Inc. | Multi-directional dialog |
US10733375B2 (en) | 2018-01-31 | 2020-08-04 | Apple Inc. | Knowledge-based framework for improving natural language understanding |
US10789959B2 (en) | 2018-03-02 | 2020-09-29 | Apple Inc. | Training speaker recognition models for digital assistants |
US10592604B2 (en) | 2018-03-12 | 2020-03-17 | Apple Inc. | Inverse text normalization for automatic speech recognition |
US10818288B2 (en) | 2018-03-26 | 2020-10-27 | Apple Inc. | Natural assistant interaction |
US10909331B2 (en) | 2018-03-30 | 2021-02-02 | Apple Inc. | Implicit identification of translation payload with neural machine translation |
US11145294B2 (en) | 2018-05-07 | 2021-10-12 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US10928918B2 (en) | 2018-05-07 | 2021-02-23 | Apple Inc. | Raise to speak |
US10984780B2 (en) | 2018-05-21 | 2021-04-20 | Apple Inc. | Global semantic word embeddings using bi-directional recurrent neural networks |
DK179822B1 (en) | 2018-06-01 | 2019-07-12 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US10892996B2 (en) | 2018-06-01 | 2021-01-12 | Apple Inc. | Variable latency device coordination |
DK201870355A1 (en) | 2018-06-01 | 2019-12-16 | Apple Inc. | Virtual assistant operation in multi-device environments |
US11386266B2 (en) | 2018-06-01 | 2022-07-12 | Apple Inc. | Text correction |
DK180639B1 (en) | 2018-06-01 | 2021-11-04 | Apple Inc | DISABILITY OF ATTENTION-ATTENTIVE VIRTUAL ASSISTANT |
US11076039B2 (en) | 2018-06-03 | 2021-07-27 | Apple Inc. | Accelerated task performance |
US11010561B2 (en) | 2018-09-27 | 2021-05-18 | Apple Inc. | Sentiment prediction from textual data |
US11462215B2 (en) | 2018-09-28 | 2022-10-04 | Apple Inc. | Multi-modal inputs for voice commands |
US11170166B2 (en) | 2018-09-28 | 2021-11-09 | Apple Inc. | Neural typographical error modeling via generative adversarial networks |
US10839159B2 (en) | 2018-09-28 | 2020-11-17 | Apple Inc. | Named entity normalization in a spoken dialog system |
US11475898B2 (en) | 2018-10-26 | 2022-10-18 | Apple Inc. | Low-latency multi-speaker speech recognition |
US11638059B2 (en) | 2019-01-04 | 2023-04-25 | Apple Inc. | Content playback on multiple devices |
US11348573B2 (en) | 2019-03-18 | 2022-05-31 | Apple Inc. | Multimodality in digital assistant systems |
US11126794B2 (en) * | 2019-04-11 | 2021-09-21 | Microsoft Technology Licensing, Llc | Targeted rewrites |
DK201970509A1 (en) | 2019-05-06 | 2021-01-15 | Apple Inc | Spoken notifications |
US11475884B2 (en) | 2019-05-06 | 2022-10-18 | Apple Inc. | Reducing digital assistant latency when a language is incorrectly determined |
US11307752B2 (en) | 2019-05-06 | 2022-04-19 | Apple Inc. | User configurable task triggers |
US11423908B2 (en) | 2019-05-06 | 2022-08-23 | Apple Inc. | Interpreting spoken requests |
US11140099B2 (en) | 2019-05-21 | 2021-10-05 | Apple Inc. | Providing message response suggestions |
DK180129B1 (en) | 2019-05-31 | 2020-06-02 | Apple Inc. | User activity shortcut suggestions |
US11289073B2 (en) | 2019-05-31 | 2022-03-29 | Apple Inc. | Device text to speech |
US11496600B2 (en) | 2019-05-31 | 2022-11-08 | Apple Inc. | Remote execution of machine-learned models |
US11360641B2 (en) | 2019-06-01 | 2022-06-14 | Apple Inc. | Increasing the relevance of new available information |
WO2021056255A1 (en) | 2019-09-25 | 2021-04-01 | Apple Inc. | Text detection using global geometry estimators |
US11386890B1 (en) * | 2020-02-11 | 2022-07-12 | Amazon Technologies, Inc. | Natural language understanding |
US11810578B2 (en) | 2020-05-11 | 2023-11-07 | Apple Inc. | Device arbitration for digital assistant-based intercom systems |
US11657803B1 (en) * | 2022-11-02 | 2023-05-23 | Actionpower Corp. | Method for speech recognition by using feedback information |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20030097347A (en) * | 2002-06-20 | 2003-12-31 | 삼성전자주식회사 | Method for transmitting short message service using voice in mobile telephone |
US20050201540A1 (en) * | 2004-03-09 | 2005-09-15 | Rampey Fred D. | Speech to text conversion system |
KR20060001177A (en) * | 2004-06-30 | 2006-01-06 | 에스케이 텔레콤주식회사 | System and method for message service, mobile communication terminal therefor |
KR20060066764A (en) * | 2004-12-14 | 2006-06-19 | 주식회사 케이티프리텔 | Method and apparatus for transforming voice message into text message and transmitting the same |
US20060217159A1 (en) * | 2005-03-22 | 2006-09-28 | Sony Ericsson Mobile Communications Ab | Wireless communications device with voice-to-text conversion |
Family Cites Families (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3402100B2 (en) * | 1996-12-27 | 2003-04-28 | カシオ計算機株式会社 | Voice control host device |
US6173259B1 (en) * | 1997-03-27 | 2001-01-09 | Speech Machines Plc | Speech to text conversion |
GB2323693B (en) * | 1997-03-27 | 2001-09-26 | Forum Technology Ltd | Speech to text conversion |
US6178403B1 (en) * | 1998-12-16 | 2001-01-23 | Sharp Laboratories Of America, Inc. | Distributed voice capture and recognition system |
JP3795692B2 (en) * | 1999-02-12 | 2006-07-12 | マイクロソフト コーポレーション | Character processing apparatus and method |
US6259657B1 (en) * | 1999-06-28 | 2001-07-10 | Robert S. Swinney | Dictation system capable of processing audio information at a remote location |
US6789060B1 (en) * | 1999-11-01 | 2004-09-07 | Gene J. Wolfe | Network based speech transcription that maintains dynamic templates |
US6532446B1 (en) * | 1999-11-24 | 2003-03-11 | Openwave Systems Inc. | Server based speech recognition user interface for wireless devices |
US7035804B2 (en) * | 2001-04-26 | 2006-04-25 | Stenograph, L.L.C. | Systems and methods for automated audio transcription, translation, and transfer |
US6901364B2 (en) * | 2001-09-13 | 2005-05-31 | Matsushita Electric Industrial Co., Ltd. | Focused language models for improved speech input of structured documents |
DE602004018290D1 (en) * | 2003-03-26 | 2009-01-22 | Philips Intellectual Property | LANGUAGE RECOGNITION AND CORRECTION SYSTEM, CORRECTION DEVICE AND METHOD FOR GENERATING A LEXICON OF ALTERNATIVES |
TWI232431B (en) * | 2004-01-13 | 2005-05-11 | Benq Corp | Method of speech transformation |
GB2427500A (en) * | 2005-06-22 | 2006-12-27 | Symbian Software Ltd | Mobile telephone text entry employing remote speech to text conversion |
CA2527813A1 (en) * | 2005-11-24 | 2007-05-24 | 9160-8083 Quebec Inc. | System, method and computer program for sending an email message from a mobile communication device based on voice input |
WO2007121441A2 (en) * | 2006-04-17 | 2007-10-25 | Vovision Llc | Methods and systems for correcting transcribed audio files |
-
2007
- 2007-09-12 US US11/854,523 patent/US20090070109A1/en not_active Abandoned
-
2008
- 2008-08-25 BR BRPI0814418-4A2A patent/BRPI0814418A2/en not_active IP Right Cessation
- 2008-08-25 EP EP08798590A patent/EP2198527A4/en not_active Withdrawn
- 2008-08-25 JP JP2010524907A patent/JP2011504304A/en active Pending
- 2008-08-25 CN CN200880107047A patent/CN101803214A/en active Pending
- 2008-08-25 RU RU2010109071/07A patent/RU2010109071A/en not_active Application Discontinuation
- 2008-08-25 KR KR1020107004918A patent/KR20100065317A/en not_active Application Discontinuation
- 2008-08-25 WO PCT/US2008/074164 patent/WO2009035842A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20030097347A (en) * | 2002-06-20 | 2003-12-31 | 삼성전자주식회사 | Method for transmitting short message service using voice in mobile telephone |
US20050201540A1 (en) * | 2004-03-09 | 2005-09-15 | Rampey Fred D. | Speech to text conversion system |
KR20060001177A (en) * | 2004-06-30 | 2006-01-06 | 에스케이 텔레콤주식회사 | System and method for message service, mobile communication terminal therefor |
KR20060066764A (en) * | 2004-12-14 | 2006-06-19 | 주식회사 케이티프리텔 | Method and apparatus for transforming voice message into text message and transmitting the same |
US20060217159A1 (en) * | 2005-03-22 | 2006-09-28 | Sony Ericsson Mobile Communications Ab | Wireless communications device with voice-to-text conversion |
Non-Patent Citations (1)
Title |
---|
See also references of EP2198527A4 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE102008061741A1 (en) * | 2008-09-09 | 2010-03-11 | Avaya Inc. | Sharing electromagnetic signal measurements to provide feedback on the signal quality of the transmission path |
DE102008061741B4 (en) * | 2008-09-09 | 2010-09-30 | Avaya Inc. | Sharing electromagnetic signal measurements to provide feedback on the signal quality of the transmission path |
US8483679B2 (en) | 2008-09-09 | 2013-07-09 | Avaya Inc. | Sharing of electromagnetic-signal measurements for providing feedback about transmit-path signal quality |
US9326160B2 (en) | 2008-09-09 | 2016-04-26 | Avaya Inc. | Sharing electromagnetic-signal measurements for providing feedback about transmit-path signal quality |
WO2010129714A3 (en) * | 2009-05-05 | 2011-02-24 | NoteVault, Inc. | System and method for multilingual transcription service with automated notification services |
US8949289B2 (en) | 2009-05-05 | 2015-02-03 | NoteVault, Inc. | System and method for multilingual transcription service with automated notification services |
JP2014505270A (en) * | 2010-12-16 | 2014-02-27 | ネイバー コーポレーション | Speech recognition client system, speech recognition server system and speech recognition method for processing online speech recognition |
JP2015179287A (en) * | 2010-12-16 | 2015-10-08 | ネイバー コーポレーションNAVER Corporation | Voice recognition client system for processing online voice recognition, voice recognition server system, and voice recognition method |
US9318111B2 (en) | 2010-12-16 | 2016-04-19 | Nhn Corporation | Voice recognition client system for processing online voice recognition, voice recognition server system, and voice recognition method |
Also Published As
Publication number | Publication date |
---|---|
KR20100065317A (en) | 2010-06-16 |
CN101803214A (en) | 2010-08-11 |
JP2011504304A (en) | 2011-02-03 |
US20090070109A1 (en) | 2009-03-12 |
EP2198527A1 (en) | 2010-06-23 |
EP2198527A4 (en) | 2011-09-28 |
BRPI0814418A2 (en) | 2015-01-20 |
RU2010109071A (en) | 2011-09-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20090070109A1 (en) | Speech-to-Text Transcription for Personal Communication Devices | |
US10714091B2 (en) | Systems and methods to present voice message information to a user of a computing device | |
US8019606B2 (en) | Identification and selection of a software application via speech | |
US7818166B2 (en) | Method and apparatus for intention based communications for mobile communication devices | |
US8275618B2 (en) | Mobile dictation correction user interface | |
RU2424547C2 (en) | Word prediction | |
CN100424632C (en) | Semantic object synchronous understanding for highly interactive interface | |
US7962344B2 (en) | Depicting a speech user interface via graphical elements | |
US9251137B2 (en) | Method of text type-ahead | |
WO2007019477A1 (en) | Redictation of misrecognized words using a list of alternatives | |
EP1901534A1 (en) | Method of managing a language information for a text input and method of inputting a text and a mobile terminal | |
MXPA04011787A (en) | Method for entering text. | |
CN1538383A (en) | Distributed speech recognition for mobile communication devices | |
JP2007280364A (en) | Method and device for switching/adapting language model | |
JP4891438B2 (en) | Eliminate ambiguity in keypad text entry | |
Huang et al. | MiPad: A multimodal interaction prototype | |
KR101251697B1 (en) | Dialog authoring and execution framework | |
US20110082685A1 (en) | Provisioning text services based on assignment of language attributes to contact entry | |
US20230040219A1 (en) | System and method for hands-free multi-lingual online communication | |
JP5079259B2 (en) | Language input system, processing method thereof, recording medium, and program | |
JP2005128076A (en) | Speech recognition system for recognizing speech data from terminal, and method therefor | |
WO2013039459A1 (en) | A method for creating keyboard and/or speech - assisted text input on electronic devices | |
Lai et al. | Speech Trumps Finger: Examining Modality Usage in a Mobile 3G Environment | |
KR20050026777A (en) | Method for recognizing and translating scan character in mobile communication terminal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 200880107047.0 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 08798590 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 748/CHENP/2010 Country of ref document: IN |
|
ENP | Entry into the national phase |
Ref document number: 2010524907 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 20107004918 Country of ref document: KR Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2010109071 Country of ref document: RU |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2008798590 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: PI0814418 Country of ref document: BR Kind code of ref document: A2 Effective date: 20100126 |