US20230359812A1 - Digitally aware neural dictation interface - Google Patents
Digitally aware neural dictation interface Download PDFInfo
- Publication number
- US20230359812A1 US20230359812A1 US18/223,429 US202318223429A US2023359812A1 US 20230359812 A1 US20230359812 A1 US 20230359812A1 US 202318223429 A US202318223429 A US 202318223429A US 2023359812 A1 US2023359812 A1 US 2023359812A1
- Authority
- US
- United States
- Prior art keywords
- speech input
- user
- syllable
- processing circuits
- speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000001537 neural effect Effects 0.000 title description 5
- 238000000034 method Methods 0.000 claims abstract description 88
- 238000007670 refining Methods 0.000 claims abstract 3
- 238000012545 processing Methods 0.000 claims description 70
- 238000004458 analytical method Methods 0.000 claims description 26
- 230000004044 response Effects 0.000 claims description 13
- 238000013473 artificial intelligence Methods 0.000 claims description 9
- 238000004422 calculation algorithm Methods 0.000 claims description 6
- 238000001914 filtration Methods 0.000 claims description 3
- 230000015572 biosynthetic process Effects 0.000 description 42
- 238000003786 synthesis reaction Methods 0.000 description 42
- 230000008569 process Effects 0.000 description 38
- 230000015654 memory Effects 0.000 description 31
- 230000006870 function Effects 0.000 description 15
- 230000000007 visual effect Effects 0.000 description 13
- 238000004891 communication Methods 0.000 description 10
- 238000013519 translation Methods 0.000 description 9
- 238000012795 verification Methods 0.000 description 8
- 230000001755 vocal effect Effects 0.000 description 8
- 230000008901 benefit Effects 0.000 description 6
- 238000006243 chemical reaction Methods 0.000 description 5
- 238000012790 confirmation Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 230000001788 irregular Effects 0.000 description 5
- 238000012552 review Methods 0.000 description 5
- 238000010200 validation analysis Methods 0.000 description 5
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 230000001413 cellular effect Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 3
- 238000012937 correction Methods 0.000 description 3
- 238000013079 data visualisation Methods 0.000 description 3
- 230000000977 initiatory effect Effects 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 239000000047 product Substances 0.000 description 3
- VYZAMTAEIAYCRO-UHFFFAOYSA-N Chromium Chemical compound [Cr] VYZAMTAEIAYCRO-UHFFFAOYSA-N 0.000 description 2
- 238000003491 array Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 230000001052 transient effect Effects 0.000 description 2
- 230000004393 visual impairment Effects 0.000 description 2
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- 239000008186 active pharmaceutical agent Substances 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000002238 attenuated effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000003990 capacitor Substances 0.000 description 1
- 229910052799 carbon Inorganic materials 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000013213 extrapolation Methods 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 208000029257 vision disease Diseases 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/174—Form filling; Merging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G10L2015/027—Syllables being the recognition units
Definitions
- Embodiments of the present disclosure relate generally to the field of hands-free input modalities and, in particular, to allowing a user of a device to populate a plurality of fields of a form displayed on the device using a voice input.
- an electronic device e.g., laptop, smart phone, etc.
- users may receive help filling out the form through software that pre-fills or auto-completes certain fields of the form (e.g., name, home address, etc.).
- users with impaired eyesight may use screen readers that read aloud text that appears on the display to help them fill out the form. But, such screen readers lack the ability to recognize specific fields of a form. Therefore, improved systems that help users fill out electronic forms faster and more efficiently are desired.
- a first example embodiment relates to a user device configured to enable a conversational electronic form that enables a user to speak in a conversational-like manner to fill out an electronic form.
- the user device includes a processing circuit comprising one or more processors coupled to non-transitory memory.
- the processing circuit is structured to: receive, by a microphone of the user device, a speech input from the user corresponding to a value of a current field of a plurality of fields of an electronic form provided on a display screen of the user device; convert the speech input into the value for the current field; display, on a display screen of the user device, the value in the current field for visual verification by the user; prompt, by the speaker of the user device, the user for information corresponding to a value of a next field of the plurality of fields in response to determining that the current field is populated with the corresponding value; and prompt, by the speaker of the user device, the user to submit the form in response to determining that the electronic form is complete based on the populated fields of the electronic form.
- a conversational electronic form is provided that may be appealing and easy-to-use for users.
- Another example embodiment relates to a method for providing a conversational electronic form.
- the method includes receiving a speech input from a user corresponding to a first field of a plurality of fields of an electronic form provided on a display screen of a user device; converting the speech input from an audible value into text; displaying, on the display screen of the user device, the text in the first field of the electronic form to allow a visual verification by the user; prompting, via a speaker of the user device, the user for information for a subsequent field in the plurality of fields upon each preceding field being populated with text from converted speech inputs; determining the form is complete and ready for submission based on a set of fields being populated with text in the plurality of fields; and enabling a submission of the completed form.
- Still another example embodiment relates to a method.
- the method includes enabling at least a partial hands-free mode of operation of a user device; determining a characteristic of an electronic form provided on a display screen of the user device based on metadata associated with the electronic form; identifying and navigating to a first field of a plurality fields of the electronic form based on the metadata; prompting, via a speaker of the user device, the user for information for the first field and a subsequent field in the plurality of fields upon each preceding field being populated with text from a speech input associated with each field; and enabling a submission of the electronic form based on a received vocal command.
- Yet another example embodiment relates to a method for providing a graphical representation via a speech input.
- the method includes: receiving, by a processing circuit of a user device, a speech input from a user selecting an option from a drop down menu of an electronic form; receiving, by the processing circuit, a speech input from the user selecting an option from the drop down menu; and displaying, on a display screen of the user device, a graphical representation corresponding to the selected option from the drop down menu of the electronic form.
- FIG. 1 is a block diagram of a system for providing a hands-free mode of operation of a user device by a user to populate a plurality of fields of a form using the user device, according to an example embodiment.
- FIG. 2 is a block diagram of the user device of FIG. 1 .
- FIG. 3 is a block diagram of the provider computing system of FIG. 1 .
- FIG. 4 A is a display output of the user device during the hands-free mode of operation, according to the example embodiment.
- FIG. 4 B is another display output of the user device during the hands-free mode of operation, according to the example embodiment.
- FIG. 5 depicts an output on the display screen of the user device of FIGS. 1 - 2 , according to an example embodiment.
- FIG. 6 is a flowchart of a method of populating a plurality of fields of a form using the user device of FIG. 1 , according to an example embodiment.
- FIG. 7 is a flowchart of a method of providing refinements to speech input samples by the provider computing system of FIG. 1 , according to an example embodiment.
- the present disclosure relates to systems, apparatuses, and methods of facilitating a hands-free mode of operation for a user to use a voice or speech input to populate a plurality of fields of an electronic form.
- Users often fill out different types of forms in their regular day-to-day activities. For example, a user may fill out a form to open a checking account at a bank, or may fill out a form for a membership at a local YMCA, etc. Due to the smaller size of the display screen and keyboard on mobile devices as compared to desktops, laptops, etc., filling out forms is often more tedious and error-prone on these types of devices.
- the system, methods, and apparatuses described herein relate to providing a hands-free mode of operation for a user to use a voice input to fill out a plurality of the fields of a form in a seamless and easy manner.
- the systems, apparatuses, and methods provide the user with an experience of a “conversational form” that prompts the user to populate each field of the form.
- the user may seemingly engage in a conversation with the form to aid the filling out of the form quickly and efficiently.
- users may be able to complete long forms easier and quicker.
- a conversational form may be consistent with the expectations of busy consumers in the modern world who may prefer the convenience of a hands-free mode of operation, such as using a voice input, to fill out a form during the course of their regular busy day.
- the systems, methods, and apparatuses described herein enable accepting a voice input from a user to populate all of or mostly all of the fields of a form by stepping through the fields of the form, one field at a time, without the necessity of a keyboard.
- the various embodiments of the present disclosure utilize a speech synthesis Application Programmer Interface (API) to convert a received user voice input from speech-to-text (e.g., alpha, numeric, or alphanumeric text).
- API Application Programmer Interface
- Metadata associated with the form is used to determine the characteristics of the form. For example, the metadata provides an indication regarding the total number of fields, the names of the fields, the maximum number of characters allowed in a field, etc.
- the systems, methods, and apparatuses described herein utilize a plug-in structured to populate the field of the form based on the characteristics of the field as determined from the metadata. As the user populates a first field, a prompt is provided to the user to populate a next field until all or a sufficient number of fields are populated for submission.
- the systems, apparatuses, and methods described herein facilitate providing a “conversational form,” whereby there is a continuous interaction between the system and the user based on prompting the user to enter a value for each field of the form until the form is completely or sufficiently populated. This is analogous to a “conversation,” where people may talk back and forth until the topic of discussion is fully/completely discussed.
- the systems and methods described herein move from a field that has been populated to a next field after a predetermined amount of time without necessarily prompting the user (i.e., an automatic movement from one field to the next based on the passage of a predetermined amount of time).
- the user may keep track of the progress of filling out the form through a visual display of the form on the display screen of the device.
- a user may choose to enter the value of a subset of fields of the form through means other than voice input such that the filling of the form may be through a combination of voice-input and manual-input (e.g., typed input).
- voice-input e.g., typed input
- manual-input e.g., typed input
- the “form” may be any type of form that is presented electronically.
- the “form” may be an application, such as a housing application, a credit card application, an account application, a club membership application (e.g., a gym), and so on.
- the form may also be a survey.
- the form may further include a log-in page for various things, such as an account (e.g., a club account, a financial account, and the like).
- the unifying characteristics are that the form is electronically displayed and includes at least one field that needs to be filled or populated.
- the “field” of the form refers to a box where information to be populated (e.g., name, date of birth, etc.).
- the “field” refers to a single item belonging to the form where a user input is or may be sought.
- the “value” refers to the characters that actually populate the field (e.g., a number or a string of alpha, numeric, or alphanumeric characters used to populate the field of a form).
- the “hands-free mode of operation” refers to the ability of a user of the device to use the device without or primarily without use of their hands/fingers.
- the hands-free mode of operation refers to the ability of the user to populate the fields of a form without using a keyboard (i.e., without manual entry of the field values).
- manual entry such as via a keyboard, may be used to supplement the hands-free mode of operation to fill out the form.
- the system 100 includes a network 110 , a user 120 associated with a user device 130 , a provider computing system 150 , a speech synthesis API server 170 , and a web server 180 .
- the user 120 , the user device 130 , the provider computing system 150 , the speech synthesis API server 170 , and the web server 180 may be coupled to each other and therefore communication through the network 110 .
- the network 110 may include one or more of the Internet, cellular network, Wi-Fi, Wi-Max, a proprietary banking network, or any other type of wired and/or wireless network.
- the user device 130 is a computing device owned by, associated with, or otherwise used by a user 120 .
- the user may be an individual or group of individuals using the user device 130 .
- the user device 130 is structured to provide a hands-free mode of operation for the user 120 so that the user may provide a voice input that is converted to text (e.g., alphanumeric text) to populate a plurality of fields of a form.
- the user device 130 is a mobile device, which includes any type of mobile device including, but not limited to, a phone (e.g., smart phone, etc.), a tablet computer, a personal digital assistant, wearable devices (e.g., glasses), and the like.
- the user device 130 is a primarily non-mobile device, such as a desktop computer.
- the user device 130 is a device that is only used by a user.
- the device may be an automated teller machine (ATM) that is equipped with a microphone, a speaker, and a display.
- ATM automated teller machine
- the user device 130 is structured as a smart phone.
- the user device 130 is shown to include a processing circuit 133 having one or more processors 134 and a memory 135 , a network interface circuit 131 , and an input/output circuit 132 .
- the memory 135 is shown to include or store a client application 136 . In this regard, the memory 135 may contain instructions belonging to the client application 136 , which can be executed by the one or more processors 134 of the user device 130 .
- the network interface circuit 131 is structured to enable the user device 130 to exchange information over the network 110 .
- the input/output circuit 132 is structured to facilitate the exchange information with the user 120 .
- An input device of or coupled to (depending on the embodiment) the input/output circuit 132 may allow the user to provide information to the user device 130 , and may include, for example, a mechanical keyboard, a touchscreen, a microphone, a camera, a fingerprint scanner, and so on.
- An output device of or coupled to (depending on the embodiment) the input/output circuit 132 allows the user to receive information from the user device 130 , and may include a display device (e.g., a display screen such as a touchscreen), a speaker, illuminating icons, LEDs, and so on. Each of these components are explained more fully herein with respect to FIG. 2 .
- the speech synthesis API server 170 is a computing system that is coupled through the network 110 to the user device 130 and the other systems/components of FIG. 1 .
- the speech synthesis API server 170 may be a back-end server or computing system comprising one or more processors, memory devices, network interfaces, and computing components as described herein that facilitate and enable various operations.
- the speech synthesis API server 170 is structured to provide a speech synthesis API.
- the speech synthesis API is structured to recognize a voice input from a user 120 , and to convert the voice input into text, such as alphanumeric text (and, in some embodiments, vice versa—from text into an audible noise).
- the user device 130 lacks built-in support for a speech synthesis API.
- the user device 130 utilizes the speech synthesis API provided by the speech synthesis API server 170 to convert the user's 120 voice input into text and vice versa.
- the user device 130 includes a speech synthesis API (may be different than that provided by the server) that converts the voice input into text.
- the speech synthesis API server 170 is also shown to include a speech recognition circuit 172 and a speech translation circuit 174 .
- the speech recognition circuit 172 is structured to recognize and convert the user's 120 voice input into text.
- the user's voice may be received via a microphone of the user device 130 , which converts the voice into data and transmits the data to the speech synthesis API server 170 .
- the speech recognition circuit 172 breaks down the user's 120 voice input (i.e., the data) into syllables.
- the speech recognition circuit 172 compares the syllables of the user's 120 voice input with known syllables stored in the non-transitory memory of the speech recognition circuit 172 to identify a plurality of syllables in the voice input.
- the speech recognition circuit 172 may then convert the plurality of syllables into characters through, for example, a look-up table maintained in the non-transitory memory of the speech recognition circuit 172 to complete the conversion of the user's 120 voice input into text.
- the converted text produced by the speech recognition circuit 172 is used to populate a relevant field of an electronic form. In other embodiments, a different process may be used to convert a user's voice input into alphanumeric text.
- the speech synthesis API server 170 is further structured to translate text (for example, text that is retrieved from an earlier populated field value of the form) into a voice output so that the value of the field may be read aloud back to the user.
- the speech synthesis API server 170 may be structured to read-out the field values of the form, which enables the verification of the field values for users 120 with visual impairments thereby enabling assistive technology support for such users 120 .
- the speech translation circuit 174 is structured to convert the field value retrieved from the electronic form into an audible output.
- the speech translation circuit 174 may access a look-up table in the non-transitory memory of the speech translation circuit 174 to identify syllables in the alphanumeric text based on the text in the field.
- the speech translation circuit 174 then sends the identified syllables to the user device 130 to read out the determined voice output.
- the web server 180 is a computing system that provides and hosts webpages/websites that are reachable by the user devices 130 via the network 110 .
- the web server 180 may be a back-end server or computing system comprising one or more processors, memory devices, network interfaces, and computing components as described herein that facilitate and enable various operations.
- the web server 180 is structured to respond to requests from clients such as the user device 130 to access a webpage identified by a particular Internet address.
- the web server 180 provides the contents of the requested webpage in response to a request for the web page from the user device 130 .
- the web server 180 includes a web page response circuit 182 .
- the web page response circuit 182 retrieves from the non-transitory memory of the web server 180 relevant information pertaining to a particular webpage requested by the user device 130 .
- the relevant information includes the metadata associated with a webpage hosted by the web server 180 , which has been requested to be downloaded by or provided to the user device 130 .
- the provider computing system 150 is owned by, managed/operated by, or otherwise associated with a provider institution.
- the provider institution may be a financial institution that offers one or more financial products and services (e.g., banking and banking applications such as mobile banking, lending products, payment and money transfer products and services, etc.).
- the provider institution is an entity that facilitates and enables, at least partly, operation of the hands-free input modality for a user to populate the fields of an electronic form in the system 100 .
- the provider computing system 150 is structured to facilitate the download of processing logic (in the form of a plug-in) to the user device 130 that enables an electronic form to be populated via a voice input.
- the provider computing system 150 includes a processing circuit 154 including a processor 155 and a memory 156 , a network interface circuit 151 structured to couple the system 150 to the other components of FIG. 1 through the network 110 , a voice authentication circuit 152 , and a provider enhancement circuit 156 .
- the processor 155 may be implemented as one or more application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), a group of processing components, or other suitable electronic processing components.
- the memory 156 may be one or more devices (e.g., RAM, ROM, Flash memory, hard disk storage) for storing data and/or computer code for completing and/or facilitating the various processes described herein.
- the memory 156 may be or include non-transient volatile memory, non-volatile memory, and non-transitory computer storage media.
- the memory 156 may include database components, object code components, script components, or any other type of information structure for supporting the various activities and information structures described herein.
- the memory 156 may be communicably coupled to the processor 155 and include computer code or instructions for executing one or more processes described herein. More details about the various components of the provider computing system 150 are provided during with respect to FIG. 3 .
- the user device 130 includes a network interface circuit 131 and an input/output circuit 132 .
- the network interface circuit 131 is structured to establish, enable, and maintain a connection with other components of the system 100 via the network 110 .
- the network interface circuit 131 is structured to enable the user device 130 to exchange information (e.g., data) over the network 110 .
- the network interface circuit 131 includes logic and hardware components that facilitate a connection of the user device 130 to the network 110 .
- the network interface circuit 131 may include a cellular modem, a Bluetooth transceiver, a Wi-Fi router, a radio-frequency identification (RFID) transceiver, and/or a near-field communication (NFC) transmitter. Further, in some arrangements, the network interface circuit 131 includes cryptography capabilities to establish a secure or relatively secure communication session with certain components, such as the provider computing system 150 .
- the input/output circuit 132 is structured to enable the exchange of communication(s) (e.g., data, information, instructions, etc.) with a user of the user device 130 .
- the input/output circuit 132 is structured to exchange data, communications, instructions, etc., with an input/output component of the user device 130 .
- the input/output circuit 132 includes one or more input/output devices, such a display screen 233 (or, display), a microphone 234 , and a speaker 235 .
- the input/output circuit 132 may include machine-readable media for facilitating the exchange of information between the input/output device and the components of the input/output circuit 132 .
- the input/output circuit 132 may include any combination of hardware components (e.g., input/output components such as a touchscreen) and machine-readable media.
- the input/output circuit 132 is machine-readable media executable by the one or more processors 134 and, in turn, coupled to the input/output devices (e.g., display screen 233 , microphone 234 , and speaker 235 ).
- the display screen 233 is structured to present visual displays (e.g., graphical user interfaces) to a user 120 .
- the display screen 233 is structured to provide and present an electronic form for the user 120 to fill.
- the display screen 233 may present prompts, notifications, and confirmations to the user 120 .
- the display screen 233 is structured as a touchscreen display device.
- the microphone 234 is structured to receive a voice input from the user 120 to fill a value of a field of the electronic form displayed by the display screen 233 .
- the microphone 234 may have any type of typical structure included with a user device, such as the smart phone user device structure.
- the speaker 235 is structured to provide an audible output.
- the audible output or noise may include a prompt, a notification, and a confirmation to the user 120 during the process of populating the fields of an electronic form.
- the speaker 235 may have any type of typical structure included with a user device, such as the smart phone user device structure.
- the speaker 235 and the microphone 234 may be the same physical device/component of the user device.
- the user device 130 includes a client application 136 .
- the client application 136 is a computer program that executes or runs on the user device 130 .
- the client application 136 may be implemented as a thin client application 239 or a native application 243 .
- a thin client application 239 is a computer program that typically executes on a networked computer with limited resources of its own (i.e., not locally on the user device). Thus, a thin client application fulfils or obtains its computational needs by using the resources of a backend server.
- the server is the provider computing system 150 . In other embodiments, the server is a third-party server.
- the native application 243 is a computer program that uses the computation power of the device in which it resides.
- the user device may be an ATM.
- the native application may be hard coded into the non-transitory memory of processor(s) of the ATM.
- the client application 136 is incorporated into an existing application, such as a mobile banking application.
- the client application 136 includes an API and/or a software development kit (SDK) that facilitates the integration of other components with the client application 136 .
- SDK software development kit
- the client application 136 is a separate application implemented on the user device 130 .
- the client application 136 may be downloaded by the user device 130 prior to its usage, hard coded into the non-transitory memory of the user device 130 (i.e., a native application), or be a web-based application.
- the user 120 may have to log onto the user device 130 and access the web-based interface before using the client application 136 .
- the client application 136 may be a web browser application 241 (e.g., Google Chrome).
- the web browser application 241 is structured to include a speech synthesis API for converting alphanumeric text to speech, and vice versa. In other embodiments, this functionality is lacking.
- the client application 136 is shown to include a digitally aware neural dictation interface (DANDI) plug-in 237 .
- the DANDI plug-in 237 (e.g., add-in, add-on, extension, etc.) is a program that adds additional features to the client application 136 .
- the DANDI plug-in 237 is implemented as program code in the JavaScript programming language.
- the DANDI plug-in may be a different structure (e.g., constructed with a different programming language).
- the DANDI plug-in 237 may be a downloadable component, which can be an add-on to an existing application such as a web browser application (e.g., Google Chrome).
- the DANDI plug-in 237 may include one or more APIs and/or SDKs that facilitate integration of the DANDI plug-in 237 into the client application 137 .
- the functionality of the DANDI plug-in 237 described herein may be hardcoded in the non-transitory memory accessible to a processor of the device (e.g., user device).
- the DANDI plug-in 237 is a native feature on the device.
- the DANDI plug-in 237 may be hard-coded into the client application such that the plug-in is not a “plug-in”; rather, the features and functionalities described herein are embedded as part of the client application.
- this implementation embodiment is not meant to be limiting as the present disclosure contemplates various other structural implementations.
- the DANDI plug-in 237 is structured to integrate with the client application 136 .
- the DANDI plug-in 237 is a downloadable software component, which integrates with the client application 136 after being downloaded.
- the DANDI plug-in 237 may be an add-on to the web browser application 241 .
- the DANDI plug-in 237 is hard coded into the client application 136 (as opposed to being downloadable).
- the DANDI plug-in 237 may be hard coded into the non-transitory memory for execution by the processor(s) of the ATM. In this way, the DANDI plug-in 237 is not downloaded to the ATM, but is ready for use upon running/using the ATM.
- the DANDI plug-in 237 is structured to enable the client application 136 to determine and identify the characteristics of each field in the plurality of fields of an electronic form.
- the DANDI plug-in 237 is structured to perform an analysis of the metadata associated with the electronic form.
- the metadata may be provided by the web server 180 to the DANDI plug-in 237 .
- the DANDI plug-in 237 may extract the metadata from the webpage hosting the form (or, from the form itself when it is not hosted by a web page, such as a PDF form).
- the metadata associated with an electronic form defines the characteristics of each field in the plurality of fields of the electronic form.
- the characteristics may include, but are not limited to, an indication regarding a total number of fields of the form, the names of the fields, the data types of each of the fields, the maximum number of characters allowed in a field, the range of acceptable values for a field, etc.
- the data type for a “date” field of the electronic form is in MM-DD-YYYY format, with the value of the MM field being a 2-digit number, and the acceptable range of values for the MM field being from 01 through 12.
- the DANDI plug-in is structured to analyze the metadata to determine various characteristics of the form and, in particular, each of the fields of the form.
- the DANDI plug-in 237 is also structured to convert the voice input for each field of the electronic form into a value for the field.
- the speech synthesis API function is provided within the client application 136 (e.g., web browser application 241 ).
- the DANDI plug-in 237 causes execution of the speech synthesis API of the web browser application 241 to convert the user's 120 voice input for a specific field into alphanumeric text.
- the DANDI plug-in 237 is coupled to the speech synthesis API of the client application 136 for converting the voice input into alphanumeric text. The DANDI plug-in 237 may then cause a populating of the text into a field in the electronic form.
- the speech synthesis API is not included with the client application 136 .
- the client application 136 may be a native application 243 (e.g., a client application executing on an ATM) that lacks support for a web browser to execute the speech synthesis API.
- the speech synthesis API is provided by the speech synthesis API server 170 .
- the DANDI plug-in is structured to interface with and access the speech synthesis API server 170 over the network 110 . The DANDI plug-in may then transmit the voice input to the speech synthesis API server 170 over the network 110 to convert the user's 120 voice input into text.
- the DANDI plug-in 237 itself may include the speech synthesis API.
- the speech synthesis API server provides the speech synthesis API that is integrated into the DANDI plug-in 237 .
- the client application via the DANDI plug-in 237 itself is structured to receive a voice input, determine the characteristics of the voice input, and convert the voice input into text that is used as the value to populate the fields of the electronic form.
- the DANDI plug-in 237 is further structured to navigate through the fields of the electronic form using the characteristic of the fields of the electronic form.
- the DANDI plug-in 237 is structured to determine and identify the characteristic of fields in the form by analyzing the metadata associated with the form from the web server 180 .
- the web server 180 is coupled to the client application and DANDI plug-in 237 .
- the metadata analysis function may be included with the DANDI plug-in 237 . This arrangement may be used on devices that may lack this feature, such as potentially certain ATMs. Irrespective, the DANDI plug-in 237 may analyze the metadata on the display that is providing the form through a variety of techniques.
- the DANDI plug-in parses the metadata that is implemented in one of a variety of languages like XML, HTML, etc. that describe the various fields of the form to determine the number of fields in the form, the range of values that are associated with each field of the form, and so on.
- the DANDI plug-in 237 also determines or identifies the fields that make up the electronic form, and identifies the relative positions of the fields on the electronic form.
- the DANDI plug-in 237 uses the acquired knowledge of the relative positions of the fields to determine a priority order of navigating to and populating the fields of the form. In one embodiment, a determined order for populating the fields is vertical (i.e., top most field to bottom most field).
- a determined order is left to right and top to bottom in an analogous manner to left-to-right reading.
- the determined order is based on the characteristics of the fields based on the metadata. For example, only three of the depicted six fields may be required to be populated in order to enable/allow submission of the electronic form. However, the three fields are randomly dispersed on the form (e.g., first, fourth and sixth fields when reading left-to-right and top-to-bottom, etc.). In this situation, the determined order is these three forms first in a reading manner (left-to-right and top-to-bottom).
- a prompt may be provided to the user indicating that all the required fields are populated and inquiring whether the user would like to submit/review the form or populate the optional fields.
- navigation to the fields may be strategic rather than a rote left-to-right or top-to-bottom manner. Such a process may save time and improve efficiency.
- the DANDI plug-in 237 is structured to navigate to a next field of the form via various different processes. For example, after a field is populated, the DANDI plug-in 237 using a timer function may automatically navigate to the next field according to the defined order (described above) after the passage of a predetermined amount of time (e.g., 1.5 seconds, 3 seconds, etc.). In another example, the DANDI plug-in 237 may receive a vocal command that instructs movement to the next field (e.g., “Please move to the next field”).
- a vocal command that instructs movement to the next field
- the client application via the DANDI plug-in 237 knows the information that is expected for a particular field (e.g., date of birth) such that when a command is received, the command is easily differentiated from the information used to populate the field.
- a manual input from the user may be used to cause the movement from field-to-field (e.g., on the touchscreen, the user may touch the field he/she wants to fill next).
- a prompt is provided by the DANDI plug-in 237 (e.g., using a speaker of the user device) to request information for the next field in the form according to the determined order of populating the fields.
- a prompt may be provided to user inquiring him/her to submit the form and/or review their answers before submission.
- the form may then be submitted (e.g., by clicking submit or via a voice command). In other embodiments, any combination of these examples may be used.
- one or more functions associated with the DANDI plug-in may be performed by the provider computing system 150 .
- the DANDI plug-in via the user device may transmit relevant data or information to the system 150 for processing with specific processing capabilities described below.
- the provider computing system 150 of FIG. 1 is shown according to an example embodiment.
- the provider computing system 150 is shown to include a network interface circuit 151 , a plug-in download circuit 152 , a voice authentication circuit 153 , a processing circuit 154 , and a provider enhancement circuit 157 .
- the processing circuit 154 includes one or more processors 155 and a non-transitory memory 156 . The processing circuit 154 is described above.
- the network interface circuit 151 (or, network interface) is structured to enable the provider computing system 150 to establish connections with other components of the system 100 via the network 110 .
- the network interface circuit 151 is structured to enable the provider computing system 150 to exchange information over the network 110 (e.g., with the user device 130 ).
- the network interface circuit 151 includes program logic that facilitates connection of the provider computing system 150 to the network 110 .
- the network interface circuit 151 supports communications between the provider computing system 150 and other systems, such as the user device 130 .
- the network interface circuit 151 may include a cellular modem, a Bluetooth transceiver, a Bluetooth beacon, a radio-frequency identification transceiver, and a near-field communication transmitter.
- the network interface circuit 151 may include the hardware and machine-readable media sufficient to support communication over multiple channels of data communication.
- the network interface circuit 151 includes cryptography capabilities to establish a secure or relatively secure communication session with the user device 130 .
- the plug-in download circuit 152 is structured to create, maintain, and provide the DANDI plug-in 237 for download to the user device 130 .
- the user device 130 may request the download of the DANDI plug-in 237 from the plug-in download circuit 152 .
- the plug-in download circuit 152 upon receiving a request from a user device 130 for download of the DANDI plug-in, causes the DANDI plug-in 237 to be downloaded to the user device 130 .
- this embodiment is used when the DANDI plug-in 237 functionality is not hardcoded into either the device or the client application: i.e., when the described functionality is being added to an existing application (e.g., a web browser).
- the voice authentication circuit 153 is structured to authenticate a voice of a user received via the network interface circuit 151 from a user device 130 .
- authentication of a user 120 may be required to use the hands-free mode of operation via the DANDI plug-in 237 based on the requirements of the form (e.g., a credit card application provided by the provider institution).
- the voice authentication circuit 153 is structured to facilitate authentication/verifying a user's voice.
- the voice authentication circuit 153 upon receiving a voice input from the user device 130 , compares the voice input with known voice samples of the user's speech stored in the provider database 365 (described herein) for a match or a substantial match.
- the voice authentication circuit 153 then notifies the user device 130 about the result of the match.
- the user device 130 may skip the step of the user 120 requiring the user to log in with authentication credentials since the user 120 is recognized/authenticated through the user's 120 voice.
- the voice authentication circuit 153 is structured to provide the benefit of facilitating the continuation of a user's 120 session without the necessity of the user being forced to provide log-in authentication credentials during the middle of using a commercial banking application. This feature may be advantageous for forms that require sensitive information to be provided (e.g., credit card applications, forms that require personal identifying information, etc.).
- the user may be authenticated into their device and then subsequently authenticated via their voice to use the hands-free mode of operation to fill out the form.
- the DANDI plug-in 237 via the client application may automatically transmit the user's voice to the voice authentication circuit 153 for an additional authentication analysis to be performed. This adds an extra layer security that is not typical for most forms that are populated.
- the provider computing system 150 further includes a provider enhancement circuit 157 that includes a speech enhancement circuit 359 , a user-specific auto-complete circuit 361 , a provider command dictionary 363 , and a provider database 365 .
- the provider database 365 is structured to hold, store, categorize, and/or otherwise serve as a repository for information regarding the user (e.g., the user's historical voice inputs).
- the provider database 365 is structured to store and selectively provide access to the stored information.
- the provider database 365 may have any one of a variety of computing structures. Although shown as being a separate component of the provider computing system 150 , in some embodiments, the provider database 365 may be part of the memory 156 .
- the speech enhancement circuit 359 is structured to enhance the quality of the input voice samples received from a user device 130 for storage in the provider database 365 .
- the enhancement in the quality of the input voice samples may be based on the removal of undesirable noise from the samples (e.g., the input voice samples may include undesirable noise from a potentially noisy surrounding of the user 120 due to the user 120 being in a busy marketplace, using public transportation, etc.).
- the client application 136 of the user device 130 may transmit samples of a user's 120 voice input to the speech enhancement circuit 359 .
- the speech enhancement circuit 359 digitally enhances the user's voice samples by applying filtering and digital processing techniques in order to obtain better quality samples of the user's original voice input.
- the speech enhancement circuit 359 is structured to provide the ability to mitigate distortions or irregularities in the user's voice input due to the presence of an accent in the voice, or a temporary condition (for example, a cold) affecting the user's voice, thus enhancing the quality of the voice input.
- the speech enhancement circuit 359 extrapolates the missing or distorted syllables in the user's 120 voice input based on comparing the current voice input of the user 120 with past voice inputs received from the particular user 120 stored in the provider database 365 .
- the speech enhancement circuit 359 executes artificial intelligence based machine learning algorithms to compare the identified syllables in the user's voice input to a database of syllables stored in the provider database 365 .
- the algorithms find the closest match for any distorted or otherwise irregular syllables in the user's voice input in the provider database 365 , and cause such syllables to be replaced by the corresponding matching syllables in the provider database 365 .
- the speech enhancement circuit 359 is structured to translate a user's 120 voice input from a first language to a second language (e.g., from a foreign language to the English language), such that the translated voice input may be used to populate the fields of a form in the second language.
- the speech enhancement circuit 359 is, thus, structured to provide an advantage in that the form is able to be populated in English even with the voice input is in a different language.
- providers of the electronic form need not translate their form(s) into various languages to accommodate the various languages of the world. Rather, a translation provided by the circuit 359 may occur to enable a wide usage.
- the provider command dictionary 363 is structured to provide a dictionary of commands recognized by the hands-free voice input system.
- the provider command dictionary 363 receives a voice input representing a user command from the client application 136 of the user device 130 .
- a user may provide to the user device 130 to modify the value of an earlier populated field.
- a user 120 may issue voice commands for the initiation, and termination of hands-free mode of operation.
- a user 120 may issue a command to read-out all the fields of a form populated so far (the command may be issued in the middle of populating the form).
- commands described herein are non-limiting in nature, and the provider command dictionary 363 is structured to support a much larger set of commands than the examples provided. Thus, rather than using the voice input to just populate the electronic form, the commands are used to provide additional functionality that may enhance the user experience.
- the client application 136 of the user device 130 communicates with the provider command dictionary 363 through the network interface circuit 151 of the provider computing system 150 to leverage the increased capability of command recognition in the provider command dictionary 363 .
- the provider command dictionary 363 thus expands the command recognition capability built into the client application 136 of the user device 130 via the DANDI plug-in 237 .
- the user-specific auto-complete circuit 361 is structured to provide auto-complete suggestions for a particular user 120 .
- the user-specific auto-complete circuit 361 receives voice inputs from the client application 136 of the user device 130 .
- the user-specific auto-complete circuit 361 then stores the voice inputs in the provider database 365 on a per-user basis, thus accumulating user 120 provided voice inputs for multiple fields of multiple forms.
- the user may be a customer of the provider institution. In which case, the provider computing system 150 may store various other information regarding the user (e.g., name, date of birth, address, ethnicity of the user, etc.).
- the user-specific auto-complete circuit 361 utilizes the voice inputs stored in the provider database 365 for a specific user 120 to perform a multi-field analysis of user's stored voice inputs to determine auto-complete suggestions.
- the user-specific auto-complete circuit 361 executes algorithms to recognize patterns in a user's 120 voice inputs across multiple fields of multiple forms stored for the user 120 in the provider database 365 to provide specific auto complete suggestions that are tailored to the particular user 120 . This may speed up a filling of the form.
- a user 120 interacts with the user device 130 to initiate the process of a hands-free mode of operation for populating a form using the user device 130 .
- the form may be an application (e.g., credit card application, account application, gym membership application, etc.), a survey, and any other form that is provided electronically.
- the user provides an authentication credential to access the user device 130 (e.g., a biometric, a passcode, etc.).
- the user 120 clicks on a portion of a form or a part of the display screen 233 of the user device 130 to initiate the hands-free mode of operation.
- a manual input e.g., the clicking on a DANDI icon—is implemented to initiate the hands free mode of operation.
- the user 120 may provide a specific voice command (e.g., initiate “DANDI”).
- the client application 136 running on the user device 130 recognizes the command to initiate the hands-free mode of voice input to subsequently enable the populating of the fields using a voice input.
- the user device 130 may be structured to provide the indication to the user 120 that the hands-free input modality is active.
- the user device 130 is structured to provide an animation on the display screen 233 of the user device 130 regarding the indication (e.g., an illuminating icon, a graphic, etc.).
- the user device 130 may prompt the user 120 through the microphone 234 to provide a voice input for the value of a field of the form that the user wants to populate once the hands-free mode of operation is enabled.
- the DANDI plug-in 237 is structured to navigate to various fields of an electronic form, and populate the values in the fields of the form.
- the form may be submitted (e.g., the application submitted, the PDF determined to be complete and then saved, access to an account provided, etc.).
- the DANDI plug-in may include the speech synthesis API.
- the DANDI plug-in 237 is structured to receive a voice input and then convert the voice input to alphanumeric text.
- the speech synthesis API of the DANDI plug-in may be disabled to reduce the processing requirements of the DANDI plug-in.
- the voice-to-text conversion may be done by the speech synthesis API of the client application and then provided to the DANDI plug-in 237 .
- the DANDI plug-in 237 may include the speech synthesis API and the metadata analysis function.
- the DANDI plug-in 237 may itself be structured to convert a voice input to alphanumeric text, navigate between the fields of the form, and ultimately enable the user to populate the form.
- certain of the functions, such as speech enhancement, voice authentication, and translation described above of the provider computing system 150 may also be included with the DANDI plug-in 237 .
- the local processing requirements for running the plug-in 237 may be reduced to improve the processing speed.
- the DANDI plug-in 237 includes the speech synthesis API and metadata analysis feature. Further, the voice authentication, translation, and enhancement features are provided by the provider computing system to alleviate the size of the plug-in 237 . That said, when the plug-in 237 is used with an application that already has a built in speech to text conversion feature, the plug-in 237 may use the output of that feature to reduce duplicative features. Alternatively, the plug-in 237 may use the output of that feature for comparison purposes to the determined text by the plug-in 237 . This may be used to help the plug-in 237 “learn” and become more refined over time.
- the provider computing system 150 may include artificial intelligence or deep learning capabilities structured to optimize operation of the plug-in 237 over time (hence, digitally aware neural dictation interface).
- processing circuit 154 may include a convolutional neural network associated with one or more of the circuits, such as the speech enhancement circuit 359 .
- the circuit 359 receives multiple samples of the user's voices (inputs). Convolution layers and programming are used by the circuit 359 to identify the syllables in the user's voice, patterns of speech, and other characteristics of the user's voice. This may include referencing other users' voice samples. This node processing results in a plurality of layers.
- the circuit 359 Using a learning a processing (e.g., back-propagating), the circuit 359 begins to readily determine and identify the features associated with the user's voice as falling within defined categories or classes (e.g., typically used words such as a “the” and “next” may form a class, nouns may form a class, and other ways to group voice inputs may form additional classes). As more learning is performed, the circuit 359 may more quickly determine a user's voice input to be a certain letter, word, or phrase. This may result in the circuit 359 developing a list that correlates the user's voice samples to these known outputs. As such and then in operation, these letters, words, or phrases may be more quickly determined by the plug-in 237 locally moving forward which enhances operation of the plug-in. In other embodiments, different neural network, machine-learning, or artificial intelligence processes may be used.
- defined categories or classes e.g., typically used words such as a “the” and “next” may form a class, noun
- FIG. 4 A a display output 400 on the display screen 233 of a user device during a hands-free mode of operation for populating a form is shown, according to an example embodiment.
- an animation 402 is displayed on the display screen 233 , which is an indication to the user that the at least partial hands-free mode of operation of the user device is enabled.
- the animation 402 provides a clear indication on the display screen 233 to the user 120 that the system is ready for a voice input to populate individual fields of the form.
- the voice input provides a value of a field of the form.
- the voice input may also be a voice command to the user device to perform a specific operation.
- FIG. 4 A an animation 402 is displayed on the display screen 233 , which is an indication to the user that the at least partial hands-free mode of operation of the user device is enabled.
- the animation 402 provides a clear indication on the display screen 233 to the user 120 that the system is ready for a voice input to populate individual fields of the form.
- the user 120 issues a command to the client application 136 to modify the value of an earlier populated field pointed to by reference numeral 404 .
- the client application 136 upon receiving an input value for the field to be modified, the client application 136 is structured to revert back to the next field where it was last awaiting a user voice input to populate the value of the field, which is pointed to by reference numeral 406 .
- the populated values of the fields of the form are available for visual verification by the user 120 on the display screen 233 of the user device 130 .
- FIG. 4 B another display output 450 on the display screen 233 of the user device 130 during a hands-free mode of operation for filing out a form is shown, according to an example embodiment.
- the animation 402 of FIG. 4 A there is no equivalent of the animation 402 of FIG. 4 A that is displayed because the client application can only be in a listening mode waiting for user's 120 voice input, or in the prompting mode (prompting the user for the value of the next field of the form, for example).
- the absence of the animation is an indication to the user 120 that the at least partial hands-free mode of operation of the user device 130 by the user 120 is currently disabled.
- a pop-up display 460 is provided on the display screen 233 of the user device 130 (or the user 120 is prompted through the microphone (not shown) of the user device 130 ).
- the previous and newly populated values 455 are available for visual verification by the user 120 on the display screen 233 of the user device 130 .
- FIG. 5 depicts a bar graph 504 as an example data visualization surfaced through a user's 120 voice input.
- a display output 502 on the display screen 233 of the user device 130 is shown based on the user's 120 selection of an option from a drop-down menu to display the underlying data as a bar graph (as opposed to other possible menu options, such as, a pie chart, or a scatter diagram, or time series graph, etc.), during a hands-free mode of operation.
- a bar graph 504 as an example data visualization surfaced through a user's 120 voice input.
- a display output 502 on the display screen 233 of the user device 130 is shown based on the user's 120 selection of an option from a drop-down menu to display the underlying data as a bar graph (as opposed to other possible menu options, such as, a pie chart, or a scatter diagram, or time series graph, etc.), during a hands-free mode of operation.
- the user's 120 voice input is used for purposes other than to provide voice input for the value of a field of a form, or to provide voice input for a recognized command.
- the user's voice input is used to select an option from a drop-down menu displayed on the display screen 233 of the user device 130 , in order to visualize the associated data.
- the client application 136 executing on the user device 130 is structured to process different types of metadata and processing logic during the hands-free operation of the user device 130 , to provide user experiences which go beyond just the populating the plurality of fields of a form by using voice input.
- hands-free operation is applicable to a Virtual Reality (VR) system that may include at least a headset or a visor, and a microphone.
- VR Virtual Reality
- a user 120 engaged in a VR experience would have to take the visor off (i.e., get out of the VR experience) and would have to provide inputs for form-filling using a keyboard (or equivalent), in case the user 120 has to fill out a form while being engaged in the VR experience.
- the hands-free mode of operation enabled by the present disclosure may be used for form-filling when the VR headset either supports a client application capable of downloading the DANDI plug-in or it is hard-coded into the computer thereby alleviate the user to have to disengage from the VR experience (e.g., without having to take the VR headset or visor off).
- FIG. 6 a flowchart depicting a method 600 of populating a plurality of fields of a form and providing a conversational electronic form using the user device of FIG. 1 is shown, according to an example embodiment. Because the method 600 may be implemented with the components of FIG. 1 , reference may be made to various components of the system 100 to aid explanation of the method 600 .
- an electronic form-filling voice function is provided.
- a plug-in and, in particular, the DANDI plug-in 237 is provided by the provider computing system 150 .
- process 602 is described as providing the DANDI plug-in 237 which provides the electronic form-filling voice functionality.
- the functionality of the DANDI plug-in is already included with an application, such as a web browser application 241 .
- the DANDI plug-in 237 includes or utilizes a speech synthesis API that converts a user's voice input into alphanumeric text.
- the DANDI plug-in 237 includes a metadata analysis feature whereby metadata associated with an electronic form is received by the DANDI plug-in 237 , and then analyzed to determine the characteristics of the fields of the form.
- the DANDI plug-in 237 is structured to determine, through metadata analysis, the characteristics about the total number of fields, the names of the fields, the data types of each of the fields, the maximum number of characters allowed in a field, the range of acceptable values for a field, etc. of the form.
- a partial hands-free operation of the user device 130 is enabled.
- an electronic form may be displayed by the user device 130 .
- the user 120 may then click on a portion of a webpage of the user device 130 , or the user 120 may issue a specific voice command or request that may be recognized by the client application 136 as the initiation of the hands-free mode of operation of the user device 130 by the user 120 in order to populate the fields of the form.
- the client application 136 may process the voice command to initiate the hands-free mode of operation, or it may pass on the command to the provider computing system 150 through the network interface circuit 131 .
- the provider enhancement circuit 156 in the provider computing system 150 may interpret the voice command to initiate the at least partial hands-free mode of operation by the user 120 of the user device 130 , and inform the client application 136 of the initiation of the at least partial hands-free mode of operation where “partial” means that the user is still able to provide manual inputs if desired.
- a speech or voice input is received.
- the user 120 may provide a speech input regarding a field.
- the microphone 234 of the user device 130 may receive the speech input from the user 120 , which corresponds to a value of a current field of a plurality of fields of the electronic form.
- the input is transmitted to the DANDI plug-in 237 and client application.
- the client application via the microphone 234 , may prompt the user for information associated with a particular field in order to provide a conversational form (e.g., the metadata analysis may determine what information is required and the client application may audibly request the user to provide this specific information via the microphone).
- the speech input is converted into text (e.g., alpha, numeric, or alphanumeric text).
- the client application 136 via the DANDI plug-in 237 , converts the speech input received at process 606 from speech into text (e.g., alphanumeric text).
- the client application 136 accesses a speech synthesis API residing on a speech synthesis API server 170 to convert the user speech input to alphanumeric text.
- the client application 136 may include in-built support for a speech synthesis API that facilitates the synthesis of speech, i.e., conversion from speech-to-alphanumeric, and from alphanumeric-to-speech.
- a field of the form is populated by the value corresponding to the text.
- the value refers to the characters (e.g., alphanumeric text) that is placed in the field based on the conversion of the speech input to alphanumeric text.
- the client application 136 via the DANDI plug-in 237 , first determines which field of the form is to be populated by analyzing the metadata describing the plurality of fields of the form. Then the client application 136 , via the DANDI plug-in 237 , populates the appropriate field of the form with the converted alphanumeric text.
- the client application 136 may provide a display on the user device 130 to permit a visual verification by the user 120 that the field value was populated correctly.
- the value may be provided on a display screen 233 of the user device 130 . This permits visual verification by the user 120 that the value has been entered into the correct field of the form, and that the value corresponds to the speech input provided by the user 120 .
- a navigation to a next field in the form is done.
- the client application via the DANDI-plug-in 237 , may determine whether there are additional fields in the form based on the metadata and whether various fields are populated or compete.
- the client application via the plug-in 237 determines the priority order of the fields (i.e., the fields that should be filled first, such as the fields that require population before submission is allowed).
- the client application via the plug-in 237 determines that additional fields need to be populated and navigates to those fields for prompting the user to fill according to the determined order.
- determining when to move or navigating from field to field may be done via a variety of different ways.
- the field may be determined to be populated and a prompt for information for the next field provided via the speaker to the user according to the determined order of fields.
- an affirmative input/confirmation such as a click or vocal command from the user is received that indicates the field is complete (a manual or verbal confirmation).
- a prompt from instructions from the plug-in to a speaker of the user device may ask the user for information specific to the next field is provided.
- this situation facilitates a conversational form whereby the plug-in navigates from field-to-field conversationally with the user until all or a sufficient amount of fields are populated. Throughout this navigation, the converted speech-to-text may be displayed in each field for visual verification by the user.
- a user may notice one or more field values that the user desires to change or modify. Accordingly, the microphone may receive a voice command from the user to modify a previously populated field value, and subsequently receive a speech input from the user to override the value of the previously populated field.
- a completeness of the form is determined.
- the client application 136 via the DANDI plug-in 237 , analyzes the metadata describing the plurality of fields of the form. Based on the analysis, the client application 136 determines whether a sufficient number of fields have been populated (a “completeness”).
- the “sufficient number of fields” may be all the fields or a predefined subset of the fields based on the metadata. For example, some fields may be optional that are not necessary to be filled in order for the form to be determined to be “complete.” In this regard, a certain subset of fields may be required to be populated before the form is allowed to be submitted. If yes, then the client application 136 proceeds to process 616 . But if the determination is that there are more fields to populate, then the client application 136 , via the DANDI plug-in 237 , reverts back to process 612 , to prompt the user 120 for the value of the next field to be populated.
- a prompt is provided to review the populated fields and/or to submit the form.
- an audible prompt via the speaker from the DANDI plug-in and client application may be provided: “The form is complete. Would you like to submit the form?”
- the client application 136 provides a display on the user device 130 to prompt the user to indicate whether the user 120 wants to review the fields of the form, or whether the user 120 wants to submit the form.
- an indication to submit the form is received.
- the indication to submit the form may be provided vocally (e.g., as a vocal command as described above).
- the indication may be provided manually (e.g., clicking on a submit button on the form, clicking save on a PDF form, etc.).
- the client application 136 proceeds to process 620 .
- the client application 136 via the DANDI plug-in 237 , reverts back to process 612 , and prompts the user 120 to re-enter or accept the existing values in each of the plurality of fields of the form.
- a termination or disablement of the hands-free operation of the user device 130 by the user 120 is accomplished.
- this step is performed by the client application 136 of the user device 130 . Further, this step may be performed automatically upon submission of the form. Or, an explicit input from the user may be provided (e.g., a vocal command or a manual entry), which disables the hands-free or at least partial hands-free mode of operation.
- the functionality provided by the DANDI plug-in may always be on. In this regard, one need not enable or disable the hands-free mode of operation. Rather, a user may simply click on a DANDI icon to initiate use of the functionality of the DANDI plug-in with a form.
- a user may navigate to a web page that hosts a form and the functionality described herein with respect to the DANDI plug-in may be automatically initiated or semi-automatically initiated (e.g., “Please confirm you would like to use DANDI” may be provided as a prompt to the user upon reaching the form on the web page).
- Method 600 provides the technical advantage of being able to navigate and populate the plurality of fields of an electronic form in a hands free manner by interacting using a user's voice.
- the user's voice is used both to receive commands, and also for obtaining the values of the fields of the form.
- functionality implemented on the user device e.g., analysis of the metadata, providing a speech synthesis API, etc.
- backend servers may be offloaded to backend servers. This provides the technical advantage of less computational load on the processor(s) of the user device.
- Method 600 also provides several user benefits, such as the ability to populate entire forms in a hands free manner by carrying out a conversation with the user device.
- the speaker of the user device based on instructions from the plug-in may prompt the user for a value of a next field of the plurality of fields without an affirmative input that the current field is populated in order to provide a conversational electronic form.
- a conversational electronic form For and like a conversation, there is a free flow of movement from one field (one conversation topic) to another field (another conversation topic). This reduces the friction typically experienced when filling out electronic forms.
- the processes of method 600 recognize the difference between user commands and user speech input, which makes populating the fields of the form easy for the user even when the user makes some mistakes by navigating back to previously populated fields through voice commands.
- Method 600 also provides the benefit of error checking in at least two ways.
- the client application 136 via the DANDI plug-in 237 , provides a visual depiction of the filled field as the field is filled.
- the client application 136 via the DANDI plug-in 237 , provides a visual depiction of the completed form prior to submission of the form.
- the client application 136 via the DANDI plug-in 237 , may read aloud the filled field as the field is populated.
- the client application 136 via the DANDI plug-in 237 may identify an error with respect to a provided field value (alternatively, as described below, the provider computing system 150 may determine one or more errors).
- the DANDI plug-in 237 may expect certain values for each field based on the metadata analysis. If the provided information does not match the expected values (e.g., expecting a phone number and a home address value is provided), the client application 136 via the DANDI plug-in 237 may prompt the user to confirm their submission or request different information. Accordingly and during population of the fields, errors may be determined by the DANDI plug-in 237 with respect to one or more field values. The client application via the DANDI plug-in may compare the received speech input (or converted text) to the expected value for the field(s) of the form and where the expected values do not match the speech input (or converted text), an error is flagged.
- the expected values e.g., expecting a phone number and a home address value is provided
- the client application 136 via the DANDI plug-in 237 may prompt the user to confirm their submission or request different information. Accordingly and during population of the fields, errors may be determined by the DANDI plug-in 237 with respect to one or more field
- the DANDI plug-in 237 may then prompt the user to confirm the field values or to change the field values with determined errors.
- the client application 136 via the DANDI plug-in 237 , may read aloud each field and each field value. In either situation, a user may receive an auditory and visual indication of the populated field and/or fields of the form. Accordingly, an ability to check for errors is provided by the DANDI plug-in 237 to the user.
- form validation and error messaging may be provided.
- the client application 136 via the DANDI plug-in 237 may provide a message to the user to confirm the value of one or more fields of the form.
- the field may be date-of-birth yet the value depicts the name of the user.
- the DANDI plug-in 237 may compare the expected field value to the actual field value to determine that a potential error exists.
- the expected field values are numerical in nature yet the received converted text is alpha characters.
- the client application 136 via the DANDI plug-in 237 may provide a message, such as an audible question, to the user: “The date-of-birth field includes a name and not a date. Would you like to return to this field to change the value?”
- the error checking may include automatic correction.
- the client application via the DANDI plug-in 237 may recognize that the value provided by the user is their name and the field is date-of-birth. Rather than populating the date-of-birth field, the client applicant via the DANDI plug-in 237 , using the metadata analysis of the form, locates and populates the name field with the user's name.
- the client application via the DANDI plug-in audibly prompts the user for their date-of-birth for the date-of-birth field.
- a smart form-filling aspect is provided via the DANDI plug-in 237 .
- the circuit 361 provides information regarding the user's date of birth.
- the client application via the DANDI plug-in may disregard the user's voice input in favor of the date-of-birth information from the circuit 361 because this information matches the required field value.
- a proactive error correction feature may be provided.
- a form validation feature may also be provided. This validation provides a holistic error examination process.
- the form is sent to the provider computer system 150 rather than the end recipient computing system.
- the system 150 uses stored submitted forms associated with the user, the system 150 compares the field values to previously-submitted field values. The system 150 may then identify potential errors and either fix them before submitting the form or transmit a message back to the user for potential correction (e.g., a verbal prompt, a written message such as a push notification, etc.). If the system 150 determines that the form appears to be correctly filled (e.g., by matching the field values with the required information for each field to ensure a match), then the system 150 transmits the form to the end recipient.
- potential correction e.g., a verbal prompt, a written message such as a push notification, etc.
- an audible prompt may be provided to the user to check the form prior submission via the client application. If the user responds in the affirmative, then the form-to-be-submitted is transmitted to the system 150 for validation. This may be beneficial for long and complex forms, such as mortgage forms where additional analysis is desired to ensure no or likely no errors.
- a carbon copy of the form and the populated fields is provided to the system by the client application during population of the form. This may enable simultaneous error-checking of the form by the system 150 .
- These validation procedures show the potential involvement of the system 150 in attempting to mitigate errors in filling of the form. This may be used when such functionality is not included with the DANDI plug-in 237 .
- artificial intelligence may be used by the processing circuit of the provider computing system 150 .
- the use of artificial intelligence such as the above-described convolutional neural networks, may also be used in the error-checking process of the electronic form prior to submission.
- a learning by the system 150 of the user's typical responses e.g., home address, favorite pet, etc.
- voice characteristics via artificial intelligence may enable a quicker filling of forms and with less likelihood for errors.
- FIG. 7 is a flowchart depicting a method 700 of providing refinements to speech input samples by the provider computing system 150 , according to an example embodiment.
- the speech input received from a user 120 may be distorted, garbled, attenuated, or irregular in some manner (for example, there are unexpected gaps in the speech input).
- the irregularity in the speech input is due to an accent in the speech, or due to the speech input being in a foreign language.
- a speech input is received.
- a speech input for filling out a field of a plurality of fields of a form is a received.
- the client application 136 via the DANDI plug-in 237 , processes the speech input by passing the received speech input to the speech synthesis API.
- an irregularity in the speech input is determined.
- the client application 136 via the DANDI plug-in 237 , fails to recognize the syllables in the speech input after processing the speech input through the speech synthesis API. Due to the failure in recognizing the syllables in the received speech input, the client application 136 classifies the speech input as irregular. The client application 136 then forwards the speech input to the provider computing system 150 for refinement of the quality of the speech input. Some of the reasons for the speech input to be irregular may be due to attenuation of the speech input, or due to the presence of background noise, or due to an accent that is hard to recognize.
- the provider computing system 150 may determine that the irregularity is a non-English language speech input. This may be identified by the client application via the DANDI plug-in in that the speech input is not recognized, which causes the client application to transmit the speech input to the provider computing system 150 .
- the processing circuit of the system 150 may then determine that speech input is a non-English language speech input (e.g., via the speech enhancement circuit 359 ).
- the provider computing system 150 may translating (e.g., via the speech enhancement circuit 359 ) the non-English language speech input into the English language as part of the refinement. Because of the storage capacity of the system 150 , a translation may be readily accomplished with minimal time by the system 150 versus the client application and DANDI plug-in 237 .
- the speech enhancement circuit 359 of the provider computing system 150 processes the received speech input with artificial intelligence (AI) smart algorithms to refine the speech input samples.
- AI artificial intelligence
- the AI smart algorithms look up the historical voice inputs for a user 120 in the provider database 365 to identify a pattern in the user's 120 speech input, and then use extrapolation to refine the current speech input samples received in irregular form.
- the speech enhancement circuit 359 digitally enhances the speech input samples through filtering and digital processing techniques to obtain better quality samples of the user's 120 speech input to improve the reliability of the recognition of the speech input.
- the speech enhancement circuit 359 of the provider computing system 150 leverages stored information for the specific user 120 in a provider database 365 by analyzing patterns in the user's past speech inputs. Based on the patterns, various refinements to the speech input may be performed. For example, based on known pronunciations of the user's home address, this information may be used to determine that a speech input is regarding the user's home address.
- a refined speech output based on the refined speech input is provided.
- the speech enhancement circuit 359 of the provider computing system 150 is structured to provide the refined speech output back to the client application 136 of the user device 130 .
- the processing circuit 154 may convert the refined speech input into text (e.g., alpha, numeric, alphanumeric text).
- the system 150 then provides the text converted from the refined speed input to the user device.
- the client application 136 uses the text from the refined speech output to populating the value of a field in order to fill an electronic form in accord with method 600 and the other disclosure contained herein.
- circuit may include hardware structured to execute the functions described herein.
- each respective “circuit” may include machine-readable media for configuring the hardware to execute the functions described herein.
- the circuit may be embodied as one or more circuitry components including, but not limited to, processing circuitry, network interfaces, peripheral devices, input devices, output devices, sensors, etc.
- a circuit may take the form of one or more analog circuits, electronic circuits (e.g., integrated circuits (IC), discrete circuits, system on a chip (SOCs) circuits, etc.), telecommunication circuits, hybrid circuits, and any other type of “circuit.”
- the “circuit” may include any type of component for accomplishing or facilitating achievement of the operations described herein.
- a circuit as described herein may include one or more transistors, logic gates (e.g., NAND, AND, NOR, OR, XOR, NOT, XNOR, etc.), resistors, multiplexers, registers, capacitors, inductors, diodes, wiring, and so on).
- the “circuit” may also include one or more processors communicatively coupled to one or more memory or memory devices.
- the one or more processors may execute instructions stored in the memory or may execute instructions otherwise accessible to the one or more processors.
- the one or more processors may be embodied in various ways.
- the one or more processors may be constructed in a manner sufficient to perform at least the operations described herein.
- the one or more processors may be shared by multiple circuits (e.g., circuit A and circuit B may comprise or otherwise share the same processor which, in some example arrangements, may execute instructions stored, or otherwise accessed, via different areas of memory).
- the one or more processors may be structured to perform or otherwise execute certain operations independent of one or more co-processors.
- two or more processors may be coupled via a bus to enable independent, parallel, pipelined, or multi-threaded instruction execution.
- Each processor may be implemented as one or more general-purpose processors, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), digital signal processors (DSPs), or other suitable electronic data processing components structured to execute instructions provided by memory.
- the one or more processors may take the form of a single core processor, multi-core processor (e.g., a dual core processor, triple core processor, quad core processor, etc.), microprocessor, etc.
- the one or more processors may be external to the apparatus, for example the one or more processors may be a remote processor (e.g., a cloud based processor). Alternatively, or additionally, the one or more processors may be internal and/or local to the apparatus. In this regard, a given circuit or components thereof may be disposed locally (e.g., as part of a local server, a local computing system, etc.) or remotely (e.g., as part of a remote server such as a cloud based server). To that end, a “circuit” as described herein may include components that are distributed across one or more locations.
- An exemplary system for implementing the overall system or portions of the arrangements might include a general purpose computing computers in the form of computers, including a processing unit, a system memory, and a system bus that couples various system components including the system memory to the processing unit.
- Each memory device may include non-transient volatile storage media, non-volatile storage media, non-transitory storage media (e.g., one or more volatile and/or non-volatile memories), etc.
- the non-volatile media may take the form of ROM, flash memory (e.g., flash memory such as NAND, 3D NAND, NOR, 3D NOR, etc.), EEPROM, MRAM, magnetic storage, hard discs, optical discs, etc.
- the volatile storage media may take the form of RAM, TRAM, ZRAM, etc. Combinations of the above are also included within the scope of machine-readable media.
- machine-executable instructions comprise, for example, instructions and data which cause a general purpose computer, special purpose computer, or special purpose processing machines to perform a certain function or group of functions.
- Each respective memory device may be operable to maintain or otherwise store information relating to the operations performed by one or more associated circuits, including processor instructions and related data (e.g., database components, object code components, script components, etc.), in accordance with the example arrangements described herein.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Business, Economics & Management (AREA)
- Game Theory and Decision Science (AREA)
- Machine Translation (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Systems and methods for populating the elements of content are disclosed. One method includes determining a plurality of elements of a document and receiving a first speech input from a user to enable a mode of operation. The method further includes authenticating the user by comparing the first speech input with at least one voice sample of the user and enabling the mode of operation. The method further includes receiving, in the mode of operation, a second speech input for filling out a first element of the document and determining an irregularity or distortion in the second speech input based on the first element and identifying a missing syllable or a distorted syllable. The method further includes refining the second speech input into at least one matching syllable, converting the refined second speech input, and providing the text to populate the first element with the text.
Description
- This application is a continuation of U.S. patent application Ser. No. 16/600,656, titled “Digitally Aware Neural Dictation Interface,” filed Oct. 22, 2019, which is a continuation of U.S. patent application Ser. No. 16/600,242, titled “Digitally Aware Neural Dictation Interface,” filed Oct. 11, 2019, all of which are incorporated herein by reference in their entireties and for all purposes.
- Embodiments of the present disclosure relate generally to the field of hands-free input modalities and, in particular, to allowing a user of a device to populate a plurality of fields of a form displayed on the device using a voice input.
- Traditionally, filling or populating an electronic form using an electronic device (e.g., laptop, smart phone, etc.) required users to manually type in the values of the fields of the form using a keyboard. To improve efficiency and save time, users may receive help filling out the form through software that pre-fills or auto-completes certain fields of the form (e.g., name, home address, etc.). Further, users with impaired eyesight may use screen readers that read aloud text that appears on the display to help them fill out the form. But, such screen readers lack the ability to recognize specific fields of a form. Therefore, improved systems that help users fill out electronic forms faster and more efficiently are desired.
- A first example embodiment relates to a user device configured to enable a conversational electronic form that enables a user to speak in a conversational-like manner to fill out an electronic form. The user device includes a processing circuit comprising one or more processors coupled to non-transitory memory. The processing circuit is structured to: receive, by a microphone of the user device, a speech input from the user corresponding to a value of a current field of a plurality of fields of an electronic form provided on a display screen of the user device; convert the speech input into the value for the current field; display, on a display screen of the user device, the value in the current field for visual verification by the user; prompt, by the speaker of the user device, the user for information corresponding to a value of a next field of the plurality of fields in response to determining that the current field is populated with the corresponding value; and prompt, by the speaker of the user device, the user to submit the form in response to determining that the electronic form is complete based on the populated fields of the electronic form. Beneficially, by moving field-to-field based on a verbal input and output (e.g., a prompt for specific information regarding the next field and a user's voice input in response to the prompt), a conversational electronic form is provided that may be appealing and easy-to-use for users.
- Another example embodiment relates to a method for providing a conversational electronic form. The method includes receiving a speech input from a user corresponding to a first field of a plurality of fields of an electronic form provided on a display screen of a user device; converting the speech input from an audible value into text; displaying, on the display screen of the user device, the text in the first field of the electronic form to allow a visual verification by the user; prompting, via a speaker of the user device, the user for information for a subsequent field in the plurality of fields upon each preceding field being populated with text from converted speech inputs; determining the form is complete and ready for submission based on a set of fields being populated with text in the plurality of fields; and enabling a submission of the completed form.
- Still another example embodiment relates to a method. The method includes enabling at least a partial hands-free mode of operation of a user device; determining a characteristic of an electronic form provided on a display screen of the user device based on metadata associated with the electronic form; identifying and navigating to a first field of a plurality fields of the electronic form based on the metadata; prompting, via a speaker of the user device, the user for information for the first field and a subsequent field in the plurality of fields upon each preceding field being populated with text from a speech input associated with each field; and enabling a submission of the electronic form based on a received vocal command.
- Yet another example embodiment relates to a method for providing a graphical representation via a speech input. The method includes: receiving, by a processing circuit of a user device, a speech input from a user selecting an option from a drop down menu of an electronic form; receiving, by the processing circuit, a speech input from the user selecting an option from the drop down menu; and displaying, on a display screen of the user device, a graphical representation corresponding to the selected option from the drop down menu of the electronic form.
- These and other features, together with the organization and manner of operation thereof, will become apparent from the following detailed description when taken in conjunction with the accompanying drawings.
-
FIG. 1 is a block diagram of a system for providing a hands-free mode of operation of a user device by a user to populate a plurality of fields of a form using the user device, according to an example embodiment. -
FIG. 2 is a block diagram of the user device ofFIG. 1 . -
FIG. 3 is a block diagram of the provider computing system ofFIG. 1 . -
FIG. 4A is a display output of the user device during the hands-free mode of operation, according to the example embodiment. -
FIG. 4B is another display output of the user device during the hands-free mode of operation, according to the example embodiment. -
FIG. 5 depicts an output on the display screen of the user device ofFIGS. 1-2 , according to an example embodiment. -
FIG. 6 is a flowchart of a method of populating a plurality of fields of a form using the user device ofFIG. 1 , according to an example embodiment. -
FIG. 7 is a flowchart of a method of providing refinements to speech input samples by the provider computing system ofFIG. 1 , according to an example embodiment. - In the following detailed description, reference is made to the accompanying drawings, which form a part hereof. In the drawings, similar symbols typically identify similar components, unless dictated otherwise. The illustrative embodiments described in the detailed description, drawings, and claims are not meant to be limiting. Other embodiments may be utilized, and other changes may be made, without departing from the spirit or scope of the present disclosure. It will be readily understood that the aspects of the present disclosure, as generally described herein, and illustrated in the figures, can be arranged, substituted, combined, and designed in a wide variety of different configurations, all of which are explicitly contemplated and made part of this disclosure. It should be noted that the terms “voice input” and “speech input” are used interchangeably throughout the disclosure.
- The present disclosure relates to systems, apparatuses, and methods of facilitating a hands-free mode of operation for a user to use a voice or speech input to populate a plurality of fields of an electronic form. Users often fill out different types of forms in their regular day-to-day activities. For example, a user may fill out a form to open a checking account at a bank, or may fill out a form for a membership at a local YMCA, etc. Due to the smaller size of the display screen and keyboard on mobile devices as compared to desktops, laptops, etc., filling out forms is often more tedious and error-prone on these types of devices. Irrespective of whether a user uses a mobile device or some other form of a device to fill out the form, the system, methods, and apparatuses described herein relate to providing a hands-free mode of operation for a user to use a voice input to fill out a plurality of the fields of a form in a seamless and easy manner. Beneficially, the systems, apparatuses, and methods provide the user with an experience of a “conversational form” that prompts the user to populate each field of the form. In this regard, the user may seemingly engage in a conversation with the form to aid the filling out of the form quickly and efficiently. As a result, users may be able to complete long forms easier and quicker. Further, a conversational form may be consistent with the expectations of busy consumers in the modern world who may prefer the convenience of a hands-free mode of operation, such as using a voice input, to fill out a form during the course of their regular busy day.
- Existing technologies lack the capability to populate electronic forms in this manner. For example, Siri, Alexa, or other virtual assistants that enable audible commands to be implemented (e.g., changing the volume of a speaker, making a purchase, etc.) do not enable a filling out of an electronic form in this manner. In this regard, a user may command these virtual assistants to navigate a web browser to an Internet page that displays a form and then the user must revert to conventional manual entry of the fields of the form. This results in a noticeable inconvenience in the usage of these virtual assistants.
- The systems, methods, and apparatuses described herein enable accepting a voice input from a user to populate all of or mostly all of the fields of a form by stepping through the fields of the form, one field at a time, without the necessity of a keyboard. The various embodiments of the present disclosure utilize a speech synthesis Application Programmer Interface (API) to convert a received user voice input from speech-to-text (e.g., alpha, numeric, or alphanumeric text). Metadata associated with the form is used to determine the characteristics of the form. For example, the metadata provides an indication regarding the total number of fields, the names of the fields, the maximum number of characters allowed in a field, etc. The systems, methods, and apparatuses described herein utilize a plug-in structured to populate the field of the form based on the characteristics of the field as determined from the metadata. As the user populates a first field, a prompt is provided to the user to populate a next field until all or a sufficient number of fields are populated for submission. Thus, the systems, apparatuses, and methods described herein facilitate providing a “conversational form,” whereby there is a continuous interaction between the system and the user based on prompting the user to enter a value for each field of the form until the form is completely or sufficiently populated. This is analogous to a “conversation,” where people may talk back and forth until the topic of discussion is fully/completely discussed.
- It should be understood that not all embodiments require the user to be prompted to enter a voice input for the value of a field of a form. For example, in some embodiments, the systems and methods described herein move from a field that has been populated to a next field after a predetermined amount of time without necessarily prompting the user (i.e., an automatic movement from one field to the next based on the passage of a predetermined amount of time). In this instance, the user may keep track of the progress of filling out the form through a visual display of the form on the display screen of the device. Further and in some embodiments, a user may choose to enter the value of a subset of fields of the form through means other than voice input such that the filling of the form may be through a combination of voice-input and manual-input (e.g., typed input). Thus, those of ordinary skill in the art will recognize various natural and logical alternatives to the systems, methods, and apparatuses of the present disclosure with all such alternatives intended to fall within the spirit and scope of the present disclosure.
- The “form” may be any type of form that is presented electronically. Thus, the “form” may be an application, such as a housing application, a credit card application, an account application, a club membership application (e.g., a gym), and so on. The form may also be a survey. The form may further include a log-in page for various things, such as an account (e.g., a club account, a financial account, and the like). The unifying characteristics are that the form is electronically displayed and includes at least one field that needs to be filled or populated. The “field” of the form refers to a box where information to be populated (e.g., name, date of birth, etc.). In other words, the “field” refers to a single item belonging to the form where a user input is or may be sought. The “value” refers to the characters that actually populate the field (e.g., a number or a string of alpha, numeric, or alphanumeric characters used to populate the field of a form).
- Referring now to
FIG. 1 , asystem 100 that facilitates and enables a hands-free mode of operation of a device for receiving a voice input to populate the fields of a form is shown, according to an example embodiment. The “hands-free mode of operation” refers to the ability of a user of the device to use the device without or primarily without use of their hands/fingers. In particular and as primarily used herein, the hands-free mode of operation refers to the ability of the user to populate the fields of a form without using a keyboard (i.e., without manual entry of the field values). However, if desired, manual entry, such as via a keyboard, may be used to supplement the hands-free mode of operation to fill out the form. As shown, thesystem 100 includes anetwork 110, a user 120 associated with auser device 130, aprovider computing system 150, a speechsynthesis API server 170, and aweb server 180. The user 120, theuser device 130, theprovider computing system 150, the speechsynthesis API server 170, and theweb server 180 may be coupled to each other and therefore communication through thenetwork 110. Thenetwork 110 may include one or more of the Internet, cellular network, Wi-Fi, Wi-Max, a proprietary banking network, or any other type of wired and/or wireless network. - The
user device 130 is a computing device owned by, associated with, or otherwise used by a user 120. The user may be an individual or group of individuals using theuser device 130. Theuser device 130 is structured to provide a hands-free mode of operation for the user 120 so that the user may provide a voice input that is converted to text (e.g., alphanumeric text) to populate a plurality of fields of a form. In some embodiments, theuser device 130 is a mobile device, which includes any type of mobile device including, but not limited to, a phone (e.g., smart phone, etc.), a tablet computer, a personal digital assistant, wearable devices (e.g., glasses), and the like. In other embodiments, theuser device 130 is a primarily non-mobile device, such as a desktop computer. In some other embodiments, theuser device 130 is a device that is only used by a user. For example, in this scenario, the device may be an automated teller machine (ATM) that is equipped with a microphone, a speaker, and a display. In the example shown, theuser device 130 is structured as a smart phone. - The
user device 130 is shown to include aprocessing circuit 133 having one ormore processors 134 and amemory 135, anetwork interface circuit 131, and an input/output circuit 132. Thememory 135 is shown to include or store aclient application 136. In this regard, thememory 135 may contain instructions belonging to theclient application 136, which can be executed by the one ormore processors 134 of theuser device 130. Thenetwork interface circuit 131 is structured to enable theuser device 130 to exchange information over thenetwork 110. The input/output circuit 132 is structured to facilitate the exchange information with the user 120. An input device of or coupled to (depending on the embodiment) the input/output circuit 132 may allow the user to provide information to theuser device 130, and may include, for example, a mechanical keyboard, a touchscreen, a microphone, a camera, a fingerprint scanner, and so on. An output device of or coupled to (depending on the embodiment) the input/output circuit 132 allows the user to receive information from theuser device 130, and may include a display device (e.g., a display screen such as a touchscreen), a speaker, illuminating icons, LEDs, and so on. Each of these components are explained more fully herein with respect toFIG. 2 . - The speech
synthesis API server 170 is a computing system that is coupled through thenetwork 110 to theuser device 130 and the other systems/components ofFIG. 1 . The speechsynthesis API server 170 may be a back-end server or computing system comprising one or more processors, memory devices, network interfaces, and computing components as described herein that facilitate and enable various operations. The speechsynthesis API server 170 is structured to provide a speech synthesis API. The speech synthesis API is structured to recognize a voice input from a user 120, and to convert the voice input into text, such as alphanumeric text (and, in some embodiments, vice versa—from text into an audible noise). In some embodiments, theuser device 130 lacks built-in support for a speech synthesis API. In such embodiments, theuser device 130 utilizes the speech synthesis API provided by the speechsynthesis API server 170 to convert the user's 120 voice input into text and vice versa. In other embodiments, theuser device 130 includes a speech synthesis API (may be different than that provided by the server) that converts the voice input into text. The speechsynthesis API server 170 is also shown to include aspeech recognition circuit 172 and aspeech translation circuit 174. - The
speech recognition circuit 172 is structured to recognize and convert the user's 120 voice input into text. In operation, the user's voice may be received via a microphone of theuser device 130, which converts the voice into data and transmits the data to the speechsynthesis API server 170. Thespeech recognition circuit 172 breaks down the user's 120 voice input (i.e., the data) into syllables. Thespeech recognition circuit 172 then compares the syllables of the user's 120 voice input with known syllables stored in the non-transitory memory of thespeech recognition circuit 172 to identify a plurality of syllables in the voice input. Thespeech recognition circuit 172 may then convert the plurality of syllables into characters through, for example, a look-up table maintained in the non-transitory memory of thespeech recognition circuit 172 to complete the conversion of the user's 120 voice input into text. The converted text produced by thespeech recognition circuit 172 is used to populate a relevant field of an electronic form. In other embodiments, a different process may be used to convert a user's voice input into alphanumeric text. - The speech
synthesis API server 170 is further structured to translate text (for example, text that is retrieved from an earlier populated field value of the form) into a voice output so that the value of the field may be read aloud back to the user. In some embodiments, the speechsynthesis API server 170 may be structured to read-out the field values of the form, which enables the verification of the field values for users 120 with visual impairments thereby enabling assistive technology support for such users 120. - The
speech translation circuit 174 is structured to convert the field value retrieved from the electronic form into an audible output. Thespeech translation circuit 174 may access a look-up table in the non-transitory memory of thespeech translation circuit 174 to identify syllables in the alphanumeric text based on the text in the field. Thespeech translation circuit 174 then sends the identified syllables to theuser device 130 to read out the determined voice output. - The
web server 180 is a computing system that provides and hosts webpages/websites that are reachable by theuser devices 130 via thenetwork 110. Theweb server 180 may be a back-end server or computing system comprising one or more processors, memory devices, network interfaces, and computing components as described herein that facilitate and enable various operations. Theweb server 180 is structured to respond to requests from clients such as theuser device 130 to access a webpage identified by a particular Internet address. Theweb server 180 provides the contents of the requested webpage in response to a request for the web page from theuser device 130. Theweb server 180 includes a webpage response circuit 182. The webpage response circuit 182 retrieves from the non-transitory memory of theweb server 180 relevant information pertaining to a particular webpage requested by theuser device 130. In some embodiments, the relevant information includes the metadata associated with a webpage hosted by theweb server 180, which has been requested to be downloaded by or provided to theuser device 130. - The
provider computing system 150 is owned by, managed/operated by, or otherwise associated with a provider institution. The provider institution may be a financial institution that offers one or more financial products and services (e.g., banking and banking applications such as mobile banking, lending products, payment and money transfer products and services, etc.). Further and additionally, the provider institution is an entity that facilitates and enables, at least partly, operation of the hands-free input modality for a user to populate the fields of an electronic form in thesystem 100. As described herein and in some embodiments, theprovider computing system 150 is structured to facilitate the download of processing logic (in the form of a plug-in) to theuser device 130 that enables an electronic form to be populated via a voice input. - As shown, the
provider computing system 150 includes aprocessing circuit 154 including aprocessor 155 and amemory 156, anetwork interface circuit 151 structured to couple thesystem 150 to the other components ofFIG. 1 through thenetwork 110, avoice authentication circuit 152, and aprovider enhancement circuit 156. Theprocessor 155 may be implemented as one or more application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), a group of processing components, or other suitable electronic processing components. Thememory 156 may be one or more devices (e.g., RAM, ROM, Flash memory, hard disk storage) for storing data and/or computer code for completing and/or facilitating the various processes described herein. Thememory 156 may be or include non-transient volatile memory, non-volatile memory, and non-transitory computer storage media. Thememory 156 may include database components, object code components, script components, or any other type of information structure for supporting the various activities and information structures described herein. Thememory 156 may be communicably coupled to theprocessor 155 and include computer code or instructions for executing one or more processes described herein. More details about the various components of theprovider computing system 150 are provided during with respect toFIG. 3 . - Referring now to
FIG. 2 , the details of theuser device 130 are shown, according to an example embodiment. As mentioned above, theuser device 130 includes anetwork interface circuit 131 and an input/output circuit 132. Thenetwork interface circuit 131 is structured to establish, enable, and maintain a connection with other components of thesystem 100 via thenetwork 110. In this regard, thenetwork interface circuit 131 is structured to enable theuser device 130 to exchange information (e.g., data) over thenetwork 110. Thenetwork interface circuit 131 includes logic and hardware components that facilitate a connection of theuser device 130 to thenetwork 110. For example, thenetwork interface circuit 131 may include a cellular modem, a Bluetooth transceiver, a Wi-Fi router, a radio-frequency identification (RFID) transceiver, and/or a near-field communication (NFC) transmitter. Further, in some arrangements, thenetwork interface circuit 131 includes cryptography capabilities to establish a secure or relatively secure communication session with certain components, such as theprovider computing system 150. - The input/
output circuit 132 is structured to enable the exchange of communication(s) (e.g., data, information, instructions, etc.) with a user of theuser device 130. In this regard, the input/output circuit 132 is structured to exchange data, communications, instructions, etc., with an input/output component of theuser device 130. Accordingly, in one embodiment, the input/output circuit 132 includes one or more input/output devices, such a display screen 233 (or, display), amicrophone 234, and aspeaker 235. In another embodiment, the input/output circuit 132 may include machine-readable media for facilitating the exchange of information between the input/output device and the components of the input/output circuit 132. In still another embodiment, the input/output circuit 132 may include any combination of hardware components (e.g., input/output components such as a touchscreen) and machine-readable media. In the example shown, the input/output circuit 132 is machine-readable media executable by the one ormore processors 134 and, in turn, coupled to the input/output devices (e.g.,display screen 233,microphone 234, and speaker 235). - The
display screen 233 is structured to present visual displays (e.g., graphical user interfaces) to a user 120. In particular, thedisplay screen 233 is structured to provide and present an electronic form for the user 120 to fill. Thedisplay screen 233 may present prompts, notifications, and confirmations to the user 120. In the example shown, thedisplay screen 233 is structured as a touchscreen display device. - The
microphone 234 is structured to receive a voice input from the user 120 to fill a value of a field of the electronic form displayed by thedisplay screen 233. Themicrophone 234 may have any type of typical structure included with a user device, such as the smart phone user device structure. - The
speaker 235 is structured to provide an audible output. The audible output or noise may include a prompt, a notification, and a confirmation to the user 120 during the process of populating the fields of an electronic form. Thespeaker 235 may have any type of typical structure included with a user device, such as the smart phone user device structure. In some embodiments, thespeaker 235 and themicrophone 234 may be the same physical device/component of the user device. - Referring still to
FIG. 2 , theuser device 130 includes aclient application 136. Theclient application 136 is a computer program that executes or runs on theuser device 130. Theclient application 136 may be implemented as athin client application 239 or anative application 243. Athin client application 239 is a computer program that typically executes on a networked computer with limited resources of its own (i.e., not locally on the user device). Thus, a thin client application fulfils or obtains its computational needs by using the resources of a backend server. In some embodiments, the server is theprovider computing system 150. In other embodiments, the server is a third-party server. In contrast, thenative application 243 is a computer program that uses the computation power of the device in which it resides. For example, as mentioned above, the user device may be an ATM. In which case, the native application may be hard coded into the non-transitory memory of processor(s) of the ATM. - In some embodiments, the
client application 136 is incorporated into an existing application, such as a mobile banking application. In this regard, theclient application 136 includes an API and/or a software development kit (SDK) that facilitates the integration of other components with theclient application 136. In other embodiments, theclient application 136 is a separate application implemented on theuser device 130. Theclient application 136 may be downloaded by theuser device 130 prior to its usage, hard coded into the non-transitory memory of the user device 130 (i.e., a native application), or be a web-based application. In some implementations, the user 120 may have to log onto theuser device 130 and access the web-based interface before using theclient application 136. - As an example, the
client application 136 may be a web browser application 241 (e.g., Google Chrome). In some embodiments, theweb browser application 241 is structured to include a speech synthesis API for converting alphanumeric text to speech, and vice versa. In other embodiments, this functionality is lacking. - The
client application 136 is shown to include a digitally aware neural dictation interface (DANDI) plug-in 237. The DANDI plug-in 237 (e.g., add-in, add-on, extension, etc.) is a program that adds additional features to theclient application 136. In one embodiment, the DANDI plug-in 237 is implemented as program code in the JavaScript programming language. In other embodiments, the DANDI plug-in may be a different structure (e.g., constructed with a different programming language). The DANDI plug-in 237 may be a downloadable component, which can be an add-on to an existing application such as a web browser application (e.g., Google Chrome). Thus, the DANDI plug-in 237 may include one or more APIs and/or SDKs that facilitate integration of the DANDI plug-in 237 into the client application 137. In other embodiments, the functionality of the DANDI plug-in 237 described herein may be hardcoded in the non-transitory memory accessible to a processor of the device (e.g., user device). In this instance, the DANDI plug-in 237 is a native feature on the device. In yet other embodiments, the DANDI plug-in 237 may be hard-coded into the client application such that the plug-in is not a “plug-in”; rather, the features and functionalities described herein are embedded as part of the client application. Thus, while described herein as a plug-in, it should be understood that this implementation embodiment is not meant to be limiting as the present disclosure contemplates various other structural implementations. - The DANDI plug-in 237 is structured to integrate with the
client application 136. In the embodiment shown, the DANDI plug-in 237 is a downloadable software component, which integrates with theclient application 136 after being downloaded. For example, the DANDI plug-in 237 may be an add-on to theweb browser application 241. In another embodiment, the DANDI plug-in 237 is hard coded into the client application 136 (as opposed to being downloadable). For example, in the case of theclient application 136 being anative application 243 and when the device is an ATM, the DANDI plug-in 237 may be hard coded into the non-transitory memory for execution by the processor(s) of the ATM. In this way, the DANDI plug-in 237 is not downloaded to the ATM, but is ready for use upon running/using the ATM. - The DANDI plug-in 237 is structured to enable the
client application 136 to determine and identify the characteristics of each field in the plurality of fields of an electronic form. In this regard, the DANDI plug-in 237 is structured to perform an analysis of the metadata associated with the electronic form. The metadata may be provided by theweb server 180 to the DANDI plug-in 237. Alternatively, the DANDI plug-in 237 may extract the metadata from the webpage hosting the form (or, from the form itself when it is not hosted by a web page, such as a PDF form). The metadata associated with an electronic form defines the characteristics of each field in the plurality of fields of the electronic form. The characteristics may include, but are not limited to, an indication regarding a total number of fields of the form, the names of the fields, the data types of each of the fields, the maximum number of characters allowed in a field, the range of acceptable values for a field, etc. For example, in one embodiment, the data type for a “date” field of the electronic form is in MM-DD-YYYY format, with the value of the MM field being a 2-digit number, and the acceptable range of values for the MM field being from 01 through 12. The DANDI plug-in is structured to analyze the metadata to determine various characteristics of the form and, in particular, each of the fields of the form. - The DANDI plug-in 237 is also structured to convert the voice input for each field of the electronic form into a value for the field. As described above and in one embodiment, the speech synthesis API function is provided within the client application 136 (e.g., web browser application 241). In this regard and in some embodiments, the DANDI plug-in 237 causes execution of the speech synthesis API of the
web browser application 241 to convert the user's 120 voice input for a specific field into alphanumeric text. In this case, the DANDI plug-in 237 is coupled to the speech synthesis API of theclient application 136 for converting the voice input into alphanumeric text. The DANDI plug-in 237 may then cause a populating of the text into a field in the electronic form. - In another embodiment, the speech synthesis API is not included with the
client application 136. For example, theclient application 136 may be a native application 243 (e.g., a client application executing on an ATM) that lacks support for a web browser to execute the speech synthesis API. Rather, the speech synthesis API is provided by the speechsynthesis API server 170. In such embodiments, the DANDI plug-in is structured to interface with and access the speechsynthesis API server 170 over thenetwork 110. The DANDI plug-in may then transmit the voice input to the speechsynthesis API server 170 over thenetwork 110 to convert the user's 120 voice input into text. - In still another embodiment, the DANDI plug-in 237 itself may include the speech synthesis API. In this regard, the speech synthesis API server provides the speech synthesis API that is integrated into the DANDI plug-in 237. In this situation, the client application via the DANDI plug-in 237 itself is structured to receive a voice input, determine the characteristics of the voice input, and convert the voice input into text that is used as the value to populate the fields of the electronic form.
- The DANDI plug-in 237 is further structured to navigate through the fields of the electronic form using the characteristic of the fields of the electronic form. The DANDI plug-in 237 is structured to determine and identify the characteristic of fields in the form by analyzing the metadata associated with the form from the
web server 180. Thus, theweb server 180 is coupled to the client application and DANDI plug-in 237. In another embodiment, the metadata analysis function may be included with the DANDI plug-in 237. This arrangement may be used on devices that may lack this feature, such as potentially certain ATMs. Irrespective, the DANDI plug-in 237 may analyze the metadata on the display that is providing the form through a variety of techniques. For example, in one embodiment, the DANDI plug-in parses the metadata that is implemented in one of a variety of languages like XML, HTML, etc. that describe the various fields of the form to determine the number of fields in the form, the range of values that are associated with each field of the form, and so on. During the metadata analysis, the DANDI plug-in 237 also determines or identifies the fields that make up the electronic form, and identifies the relative positions of the fields on the electronic form. The DANDI plug-in 237 uses the acquired knowledge of the relative positions of the fields to determine a priority order of navigating to and populating the fields of the form. In one embodiment, a determined order for populating the fields is vertical (i.e., top most field to bottom most field). In another embodiment, a determined order is left to right and top to bottom in an analogous manner to left-to-right reading. In still another embodiment, the determined order is based on the characteristics of the fields based on the metadata. For example, only three of the depicted six fields may be required to be populated in order to enable/allow submission of the electronic form. However, the three fields are randomly dispersed on the form (e.g., first, fourth and sixth fields when reading left-to-right and top-to-bottom, etc.). In this situation, the determined order is these three forms first in a reading manner (left-to-right and top-to-bottom). At this point, a prompt may be provided to the user indicating that all the required fields are populated and inquiring whether the user would like to submit/review the form or populate the optional fields. In this manner, navigation to the fields may be strategic rather than a rote left-to-right or top-to-bottom manner. Such a process may save time and improve efficiency. - Navigating through each field of the form may occur via a variety of ways. In this regard, the DANDI plug-in 237 is structured to navigate to a next field of the form via various different processes. For example, after a field is populated, the DANDI plug-in 237 using a timer function may automatically navigate to the next field according to the defined order (described above) after the passage of a predetermined amount of time (e.g., 1.5 seconds, 3 seconds, etc.). In another example, the DANDI plug-in 237 may receive a vocal command that instructs movement to the next field (e.g., “Please move to the next field”). Based on the metadata analysis, the client application via the DANDI plug-in 237 knows the information that is expected for a particular field (e.g., date of birth) such that when a command is received, the command is easily differentiated from the information used to populate the field. In still another example, a manual input from the user may be used to cause the movement from field-to-field (e.g., on the touchscreen, the user may touch the field he/she wants to fill next). In yet another embodiment, a prompt is provided by the DANDI plug-in 237 (e.g., using a speaker of the user device) to request information for the next field in the form according to the determined order of populating the fields. An example is as follows: “[
Field 1—Name] Please provide your name.” [Field 2—date of birth] Thank you. Please provide your date of birth.” [Field 3—address]. Please provide your address.” In this example, once information is received from the user and it is populated into the form, a confirmation is provided (e.g., thank you) and the information for the next field is asked. Contemporaneously, the field where information is currently sought may be highlighted on the screen. This enables two forms of indications to the user (i.e., the audible prompt for certain information and the visual highlighting of the form). In this embodiment, a conversational form is provided. Once all or the required fields are populated, a prompt may be provided to user inquiring him/her to submit the form and/or review their answers before submission. The form may then be submitted (e.g., by clicking submit or via a voice command). In other embodiments, any combination of these examples may be used. - In some embodiments, one or more functions associated with the DANDI plug-in may be performed by the
provider computing system 150. Thus, the DANDI plug-in via the user device may transmit relevant data or information to thesystem 150 for processing with specific processing capabilities described below. - Referring now to
FIG. 3 , theprovider computing system 150 ofFIG. 1 is shown according to an example embodiment. Theprovider computing system 150 is shown to include anetwork interface circuit 151, a plug-indownload circuit 152, avoice authentication circuit 153, aprocessing circuit 154, and aprovider enhancement circuit 157. Theprocessing circuit 154 includes one ormore processors 155 and anon-transitory memory 156. Theprocessing circuit 154 is described above. - The network interface circuit 151 (or, network interface) is structured to enable the
provider computing system 150 to establish connections with other components of thesystem 100 via thenetwork 110. Thenetwork interface circuit 151 is structured to enable theprovider computing system 150 to exchange information over the network 110 (e.g., with the user device 130). Thenetwork interface circuit 151 includes program logic that facilitates connection of theprovider computing system 150 to thenetwork 110. Thenetwork interface circuit 151 supports communications between theprovider computing system 150 and other systems, such as theuser device 130. For example, thenetwork interface circuit 151 may include a cellular modem, a Bluetooth transceiver, a Bluetooth beacon, a radio-frequency identification transceiver, and a near-field communication transmitter. Thus, thenetwork interface circuit 151 may include the hardware and machine-readable media sufficient to support communication over multiple channels of data communication. Further, in some arrangements, thenetwork interface circuit 151 includes cryptography capabilities to establish a secure or relatively secure communication session with theuser device 130. - The plug-in
download circuit 152 is structured to create, maintain, and provide the DANDI plug-in 237 for download to theuser device 130. For example, theuser device 130 may request the download of the DANDI plug-in 237 from the plug-indownload circuit 152. In some embodiments, upon receiving a request from auser device 130 for download of the DANDI plug-in, the plug-indownload circuit 152 causes the DANDI plug-in 237 to be downloaded to theuser device 130. Thus, this embodiment is used when the DANDI plug-in 237 functionality is not hardcoded into either the device or the client application: i.e., when the described functionality is being added to an existing application (e.g., a web browser). - The
voice authentication circuit 153 is structured to authenticate a voice of a user received via thenetwork interface circuit 151 from auser device 130. In some embodiments, authentication of a user 120 may be required to use the hands-free mode of operation via the DANDI plug-in 237 based on the requirements of the form (e.g., a credit card application provided by the provider institution). Thevoice authentication circuit 153 is structured to facilitate authentication/verifying a user's voice. In some embodiments, thevoice authentication circuit 153, upon receiving a voice input from theuser device 130, compares the voice input with known voice samples of the user's speech stored in the provider database 365 (described herein) for a match or a substantial match. Thevoice authentication circuit 153 then notifies theuser device 130 about the result of the match. In case of a match, theuser device 130 may skip the step of the user 120 requiring the user to log in with authentication credentials since the user 120 is recognized/authenticated through the user's 120 voice. Thus, in some embodiments, thevoice authentication circuit 153 is structured to provide the benefit of facilitating the continuation of a user's 120 session without the necessity of the user being forced to provide log-in authentication credentials during the middle of using a commercial banking application. This feature may be advantageous for forms that require sensitive information to be provided (e.g., credit card applications, forms that require personal identifying information, etc.). In operation, the user may be authenticated into their device and then subsequently authenticated via their voice to use the hands-free mode of operation to fill out the form. In this regard and based on the metadata analysis, even predefined sensitive information is determined to be required for the form, the DANDI plug-in 237 via the client application may automatically transmit the user's voice to thevoice authentication circuit 153 for an additional authentication analysis to be performed. This adds an extra layer security that is not typical for most forms that are populated. - The
provider computing system 150 further includes aprovider enhancement circuit 157 that includes aspeech enhancement circuit 359, a user-specific auto-complete circuit 361, aprovider command dictionary 363, and aprovider database 365. Theprovider database 365 is structured to hold, store, categorize, and/or otherwise serve as a repository for information regarding the user (e.g., the user's historical voice inputs). Theprovider database 365 is structured to store and selectively provide access to the stored information. Theprovider database 365 may have any one of a variety of computing structures. Although shown as being a separate component of theprovider computing system 150, in some embodiments, theprovider database 365 may be part of thememory 156. - The
speech enhancement circuit 359 is structured to enhance the quality of the input voice samples received from auser device 130 for storage in theprovider database 365. In some embodiments, the enhancement in the quality of the input voice samples may be based on the removal of undesirable noise from the samples (e.g., the input voice samples may include undesirable noise from a potentially noisy surrounding of the user 120 due to the user 120 being in a busy marketplace, using public transportation, etc.). Theclient application 136 of theuser device 130 may transmit samples of a user's 120 voice input to thespeech enhancement circuit 359. Thespeech enhancement circuit 359 digitally enhances the user's voice samples by applying filtering and digital processing techniques in order to obtain better quality samples of the user's original voice input. In some embodiments, thespeech enhancement circuit 359 is structured to provide the ability to mitigate distortions or irregularities in the user's voice input due to the presence of an accent in the voice, or a temporary condition (for example, a cold) affecting the user's voice, thus enhancing the quality of the voice input. For example, in one embodiment, thespeech enhancement circuit 359 extrapolates the missing or distorted syllables in the user's 120 voice input based on comparing the current voice input of the user 120 with past voice inputs received from the particular user 120 stored in theprovider database 365. In some embodiments, thespeech enhancement circuit 359 executes artificial intelligence based machine learning algorithms to compare the identified syllables in the user's voice input to a database of syllables stored in theprovider database 365. The algorithms find the closest match for any distorted or otherwise irregular syllables in the user's voice input in theprovider database 365, and cause such syllables to be replaced by the corresponding matching syllables in theprovider database 365. - In some embodiments, the
speech enhancement circuit 359 is structured to translate a user's 120 voice input from a first language to a second language (e.g., from a foreign language to the English language), such that the translated voice input may be used to populate the fields of a form in the second language. Thespeech enhancement circuit 359 is, thus, structured to provide an advantage in that the form is able to be populated in English even with the voice input is in a different language. Thus, providers of the electronic form need not translate their form(s) into various languages to accommodate the various languages of the world. Rather, a translation provided by thecircuit 359 may occur to enable a wide usage. - The
provider command dictionary 363 is structured to provide a dictionary of commands recognized by the hands-free voice input system. In some embodiments, theprovider command dictionary 363 receives a voice input representing a user command from theclient application 136 of theuser device 130. For example, a user may provide to theuser device 130 to modify the value of an earlier populated field. As another example, a user 120 may issue voice commands for the initiation, and termination of hands-free mode of operation. As still another example, a user 120 may issue a command to read-out all the fields of a form populated so far (the command may be issued in the middle of populating the form). It should be understood that the examples of commands described herein are non-limiting in nature, and theprovider command dictionary 363 is structured to support a much larger set of commands than the examples provided. Thus, rather than using the voice input to just populate the electronic form, the commands are used to provide additional functionality that may enhance the user experience. - In some embodiments, the
client application 136 of theuser device 130 communicates with theprovider command dictionary 363 through thenetwork interface circuit 151 of theprovider computing system 150 to leverage the increased capability of command recognition in theprovider command dictionary 363. Theprovider command dictionary 363 thus expands the command recognition capability built into theclient application 136 of theuser device 130 via the DANDI plug-in 237. - The user-specific auto-
complete circuit 361 is structured to provide auto-complete suggestions for a particular user 120. In some embodiments, the user-specific auto-complete circuit 361 receives voice inputs from theclient application 136 of theuser device 130. The user-specific auto-complete circuit 361 then stores the voice inputs in theprovider database 365 on a per-user basis, thus accumulating user 120 provided voice inputs for multiple fields of multiple forms. Further, the user may be a customer of the provider institution. In which case, theprovider computing system 150 may store various other information regarding the user (e.g., name, date of birth, address, ethnicity of the user, etc.). In some embodiments, the user-specific auto-complete circuit 361 utilizes the voice inputs stored in theprovider database 365 for a specific user 120 to perform a multi-field analysis of user's stored voice inputs to determine auto-complete suggestions. For example, in one embodiment, the user-specific auto-complete circuit 361 executes algorithms to recognize patterns in a user's 120 voice inputs across multiple fields of multiple forms stored for the user 120 in theprovider database 365 to provide specific auto complete suggestions that are tailored to the particular user 120. This may speed up a filling of the form. - In operation, a user 120 interacts with the
user device 130 to initiate the process of a hands-free mode of operation for populating a form using theuser device 130. The form may be an application (e.g., credit card application, account application, gym membership application, etc.), a survey, and any other form that is provided electronically. In some implementations, the user provides an authentication credential to access the user device 130 (e.g., a biometric, a passcode, etc.). In one embodiment, the user 120 clicks on a portion of a form or a part of thedisplay screen 233 of theuser device 130 to initiate the hands-free mode of operation. For example, a manual input—e.g., the clicking on a DANDI icon—is implemented to initiate the hands free mode of operation. In another example, the user 120 may provide a specific voice command (e.g., initiate “DANDI”). In either situation, theclient application 136 running on theuser device 130 recognizes the command to initiate the hands-free mode of voice input to subsequently enable the populating of the fields using a voice input. Upon enabling the hands-free mode of operation, theuser device 130 may be structured to provide the indication to the user 120 that the hands-free input modality is active. For example, in one embodiment, theuser device 130 is structured to provide an animation on thedisplay screen 233 of theuser device 130 regarding the indication (e.g., an illuminating icon, a graphic, etc.). - As described above, the
user device 130 may prompt the user 120 through themicrophone 234 to provide a voice input for the value of a field of the form that the user wants to populate once the hands-free mode of operation is enabled. As alluded to before, in one embodiment, the DANDI plug-in 237 is structured to navigate to various fields of an electronic form, and populate the values in the fields of the form. As described above, once all or a sufficient number of the fields are populated, the form may be submitted (e.g., the application submitted, the PDF determined to be complete and then saved, access to an account provided, etc.). - While certain functions are described above separately with respect to the DANDI plug-in 237, in some embodiments, various functions may be included with the DANDI plug-in 237. For example and as described above, the DANDI plug-in may include the speech synthesis API. In this regard and when running, the DANDI plug-in 237 is structured to receive a voice input and then convert the voice input to alphanumeric text. In other embodiments and when the client application includes a similar function, the speech synthesis API of the DANDI plug-in may be disabled to reduce the processing requirements of the DANDI plug-in. In this scenario, the voice-to-text conversion may be done by the speech synthesis API of the client application and then provided to the DANDI plug-in 237. Accordingly, in one embodiment, the DANDI plug-in 237 may include the speech synthesis API and the metadata analysis function. As a result, the DANDI plug-in 237 may itself be structured to convert a voice input to alphanumeric text, navigate between the fields of the form, and ultimately enable the user to populate the form. As still another example, certain of the functions, such as speech enhancement, voice authentication, and translation described above of the
provider computing system 150 may also be included with the DANDI plug-in 237. In some embodiments, by keeping these functions accessible via the plug-in and not a part of the plug-in 237, the local processing requirements for running the plug-in 237 may be reduced to improve the processing speed. - In the example shown, the DANDI plug-in 237 includes the speech synthesis API and metadata analysis feature. Further, the voice authentication, translation, and enhancement features are provided by the provider computing system to alleviate the size of the plug-in 237. That said, when the plug-in 237 is used with an application that already has a built in speech to text conversion feature, the plug-in 237 may use the output of that feature to reduce duplicative features. Alternatively, the plug-in 237 may use the output of that feature for comparison purposes to the determined text by the plug-in 237. This may be used to help the plug-in 237 “learn” and become more refined over time.
- In one specific example, the
provider computing system 150 may include artificial intelligence or deep learning capabilities structured to optimize operation of the plug-in 237 over time (hence, digitally aware neural dictation interface). For example,processing circuit 154 may include a convolutional neural network associated with one or more of the circuits, such as thespeech enhancement circuit 359. In operation, thecircuit 359 receives multiple samples of the user's voices (inputs). Convolution layers and programming are used by thecircuit 359 to identify the syllables in the user's voice, patterns of speech, and other characteristics of the user's voice. This may include referencing other users' voice samples. This node processing results in a plurality of layers. Using a learning a processing (e.g., back-propagating), thecircuit 359 begins to readily determine and identify the features associated with the user's voice as falling within defined categories or classes (e.g., typically used words such as a “the” and “next” may form a class, nouns may form a class, and other ways to group voice inputs may form additional classes). As more learning is performed, thecircuit 359 may more quickly determine a user's voice input to be a certain letter, word, or phrase. This may result in thecircuit 359 developing a list that correlates the user's voice samples to these known outputs. As such and then in operation, these letters, words, or phrases may be more quickly determined by the plug-in 237 locally moving forward which enhances operation of the plug-in. In other embodiments, different neural network, machine-learning, or artificial intelligence processes may be used. - Referring now to
FIG. 4A , adisplay output 400 on thedisplay screen 233 of a user device during a hands-free mode of operation for populating a form is shown, according to an example embodiment. In the embodiment ofFIG. 4A , ananimation 402 is displayed on thedisplay screen 233, which is an indication to the user that the at least partial hands-free mode of operation of the user device is enabled. Thus, theanimation 402 provides a clear indication on thedisplay screen 233 to the user 120 that the system is ready for a voice input to populate individual fields of the form. The voice input provides a value of a field of the form. The voice input may also be a voice command to the user device to perform a specific operation. In the example ofFIG. 4A , the user 120 issues a command to theclient application 136 to modify the value of an earlier populated field pointed to byreference numeral 404. In some implementations, upon receiving an input value for the field to be modified, theclient application 136 is structured to revert back to the next field where it was last awaiting a user voice input to populate the value of the field, which is pointed to byreference numeral 406. The populated values of the fields of the form are available for visual verification by the user 120 on thedisplay screen 233 of theuser device 130. - Referring now to
FIG. 4B , anotherdisplay output 450 on thedisplay screen 233 of theuser device 130 during a hands-free mode of operation for filing out a form is shown, according to an example embodiment. In the embodiment ofFIG. 4B , there is no equivalent of theanimation 402 ofFIG. 4A that is displayed because the client application can only be in a listening mode waiting for user's 120 voice input, or in the prompting mode (prompting the user for the value of the next field of the form, for example). The absence of the animation is an indication to the user 120 that the at least partial hands-free mode of operation of theuser device 130 by the user 120 is currently disabled. In some embodiments, a pop-updisplay 460 is provided on thedisplay screen 233 of the user device 130 (or the user 120 is prompted through the microphone (not shown) of the user device 130). The previous and newly populatedvalues 455 are available for visual verification by the user 120 on thedisplay screen 233 of theuser device 130. - Referring now to
FIG. 5 , anembodiment 500 of a display output 502 of data visualization through voice input is shown, according to an example embodiment.FIG. 5 depicts abar graph 504 as an example data visualization surfaced through a user's 120 voice input. In the example embodiment, a display output 502 on thedisplay screen 233 of theuser device 130 is shown based on the user's 120 selection of an option from a drop-down menu to display the underlying data as a bar graph (as opposed to other possible menu options, such as, a pie chart, or a scatter diagram, or time series graph, etc.), during a hands-free mode of operation. In the example embodiment ofFIG. 5 , the user's 120 voice input is used for purposes other than to provide voice input for the value of a field of a form, or to provide voice input for a recognized command. In the example embodiment, the user's voice input is used to select an option from a drop-down menu displayed on thedisplay screen 233 of theuser device 130, in order to visualize the associated data. In operation, theclient application 136 executing on theuser device 130 is structured to process different types of metadata and processing logic during the hands-free operation of theuser device 130, to provide user experiences which go beyond just the populating the plurality of fields of a form by using voice input. - It should be understood that providing voice input by a user 120, or surfacing data visualization through voice input (as explained above in the discussion of
FIG. 5 ) are non-limiting examples of hands-free operation of auser device 130 by a user 120 via the DANDI plug-in 237. In some embodiments, other example uses of hands-free operation are envisaged by, and fall under the scope of the present disclosure. In an example embodiment, hands-free operation is applicable to a Virtual Reality (VR) system that may include at least a headset or a visor, and a microphone. Conventionally, a user 120 engaged in a VR experience would have to take the visor off (i.e., get out of the VR experience) and would have to provide inputs for form-filling using a keyboard (or equivalent), in case the user 120 has to fill out a form while being engaged in the VR experience. But the hands-free mode of operation enabled by the present disclosure may be used for form-filling when the VR headset either supports a client application capable of downloading the DANDI plug-in or it is hard-coded into the computer thereby alleviate the user to have to disengage from the VR experience (e.g., without having to take the VR headset or visor off). - Referring now to
FIG. 6 , a flowchart depicting amethod 600 of populating a plurality of fields of a form and providing a conversational electronic form using the user device ofFIG. 1 is shown, according to an example embodiment. Because themethod 600 may be implemented with the components ofFIG. 1 , reference may be made to various components of thesystem 100 to aid explanation of themethod 600. - At
process 602, an electronic form-filling voice function is provided. In one embodiment, a plug-in and, in particular, the DANDI plug-in 237 is provided by theprovider computing system 150. In this regard,process 602 is described as providing the DANDI plug-in 237 which provides the electronic form-filling voice functionality. In another embodiment, the functionality of the DANDI plug-in is already included with an application, such as aweb browser application 241. The DANDI plug-in 237 includes or utilizes a speech synthesis API that converts a user's voice input into alphanumeric text. The DANDI plug-in 237 includes a metadata analysis feature whereby metadata associated with an electronic form is received by the DANDI plug-in 237, and then analyzed to determine the characteristics of the fields of the form. For example, the DANDI plug-in 237 is structured to determine, through metadata analysis, the characteristics about the total number of fields, the names of the fields, the data types of each of the fields, the maximum number of characters allowed in a field, the range of acceptable values for a field, etc. of the form. - At
process 604, a partial hands-free operation of theuser device 130 is enabled. For example, an electronic form may be displayed by theuser device 130. The user 120 may then click on a portion of a webpage of theuser device 130, or the user 120 may issue a specific voice command or request that may be recognized by theclient application 136 as the initiation of the hands-free mode of operation of theuser device 130 by the user 120 in order to populate the fields of the form. Theclient application 136 may process the voice command to initiate the hands-free mode of operation, or it may pass on the command to theprovider computing system 150 through thenetwork interface circuit 131. In the latter embodiment, theprovider enhancement circuit 156 in theprovider computing system 150 may interpret the voice command to initiate the at least partial hands-free mode of operation by the user 120 of theuser device 130, and inform theclient application 136 of the initiation of the at least partial hands-free mode of operation where “partial” means that the user is still able to provide manual inputs if desired. - At
process 606, a speech or voice input is received. For example, the user 120 may provide a speech input regarding a field. Themicrophone 234 of theuser device 130 may receive the speech input from the user 120, which corresponds to a value of a current field of a plurality of fields of the electronic form. The input is transmitted to the DANDI plug-in 237 and client application. The client application, via themicrophone 234, may prompt the user for information associated with a particular field in order to provide a conversational form (e.g., the metadata analysis may determine what information is required and the client application may audibly request the user to provide this specific information via the microphone). - At
process 608, the speech input is converted into text (e.g., alpha, numeric, or alphanumeric text). Theclient application 136, via the DANDI plug-in 237, converts the speech input received atprocess 606 from speech into text (e.g., alphanumeric text). In another embodiment, theclient application 136 accesses a speech synthesis API residing on a speechsynthesis API server 170 to convert the user speech input to alphanumeric text. In other embodiments, theclient application 136 may include in-built support for a speech synthesis API that facilitates the synthesis of speech, i.e., conversion from speech-to-alphanumeric, and from alphanumeric-to-speech. - At
process 610, a field of the form is populated by the value corresponding to the text. The value refers to the characters (e.g., alphanumeric text) that is placed in the field based on the conversion of the speech input to alphanumeric text. In one embodiment, theclient application 136, via the DANDI plug-in 237, first determines which field of the form is to be populated by analyzing the metadata describing the plurality of fields of the form. Then theclient application 136, via the DANDI plug-in 237, populates the appropriate field of the form with the converted alphanumeric text. Finally, theclient application 136 may provide a display on theuser device 130 to permit a visual verification by the user 120 that the field value was populated correctly. For example, the value may be provided on adisplay screen 233 of theuser device 130. This permits visual verification by the user 120 that the value has been entered into the correct field of the form, and that the value corresponds to the speech input provided by the user 120. - At
process 612, a navigation to a next field in the form is done. In this regard, the client application, via the DANDI-plug-in 237, may determine whether there are additional fields in the form based on the metadata and whether various fields are populated or compete. The client application via the plug-in 237 determines the priority order of the fields (i.e., the fields that should be filled first, such as the fields that require population before submission is allowed). At this point, the client application via the plug-in 237 determines that additional fields need to be populated and navigates to those fields for prompting the user to fill according to the determined order. As described above, determining when to move or navigating from field to field may be done via a variety of different ways. For example, after the passage of predetermined amount of time, the field may be determined to be populated and a prompt for information for the next field provided via the speaker to the user according to the determined order of fields. As another example, an affirmative input/confirmation such as a click or vocal command from the user is received that indicates the field is complete (a manual or verbal confirmation). As still another example, a prompt from instructions from the plug-in to a speaker of the user device may ask the user for information specific to the next field is provided. With regard to the last example of providing an audible prompt for the required information for the next field, this situation facilitates a conversational form whereby the plug-in navigates from field-to-field conversationally with the user until all or a sufficient amount of fields are populated. Throughout this navigation, the converted speech-to-text may be displayed in each field for visual verification by the user. - In some embodiments, a user may notice one or more field values that the user desires to change or modify. Accordingly, the microphone may receive a voice command from the user to modify a previously populated field value, and subsequently receive a speech input from the user to override the value of the previously populated field.
- At
process 614, a completeness of the form is determined. Theclient application 136, via the DANDI plug-in 237, analyzes the metadata describing the plurality of fields of the form. Based on the analysis, theclient application 136 determines whether a sufficient number of fields have been populated (a “completeness”). The “sufficient number of fields” may be all the fields or a predefined subset of the fields based on the metadata. For example, some fields may be optional that are not necessary to be filled in order for the form to be determined to be “complete.” In this regard, a certain subset of fields may be required to be populated before the form is allowed to be submitted. If yes, then theclient application 136 proceeds to process 616. But if the determination is that there are more fields to populate, then theclient application 136, via the DANDI plug-in 237, reverts back toprocess 612, to prompt the user 120 for the value of the next field to be populated. - At
process 616, upon the form being determined to be complete, a prompt is provided to review the populated fields and/or to submit the form. For example, an audible prompt via the speaker from the DANDI plug-in and client application may be provided: “The form is complete. Would you like to submit the form?” Theclient application 136 provides a display on theuser device 130 to prompt the user to indicate whether the user 120 wants to review the fields of the form, or whether the user 120 wants to submit the form. - At
process 618, an indication to submit the form is received. Thus, submission of the form is enabled. The indication to submit the form may be provided vocally (e.g., as a vocal command as described above). The indication may be provided manually (e.g., clicking on a submit button on the form, clicking save on a PDF form, etc.). Upon the user 120 providing an indication to submit the form, theclient application 136 proceeds to process 620. On the other hand, if the user 120 indicates atprocess 616 that the user 120 wants to review the populated values of the plurality of fields of the form, then theclient application 136, via the DANDI plug-in 237, reverts back toprocess 612, and prompts the user 120 to re-enter or accept the existing values in each of the plurality of fields of the form. - At
process 620, a termination or disablement of the hands-free operation of theuser device 130 by the user 120 is accomplished. In one embodiment, this step is performed by theclient application 136 of theuser device 130. Further, this step may be performed automatically upon submission of the form. Or, an explicit input from the user may be provided (e.g., a vocal command or a manual entry), which disables the hands-free or at least partial hands-free mode of operation. As yet another embodiment, the functionality provided by the DANDI plug-in may always be on. In this regard, one need not enable or disable the hands-free mode of operation. Rather, a user may simply click on a DANDI icon to initiate use of the functionality of the DANDI plug-in with a form. Alternatively or additionally, a user may navigate to a web page that hosts a form and the functionality described herein with respect to the DANDI plug-in may be automatically initiated or semi-automatically initiated (e.g., “Please confirm you would like to use DANDI” may be provided as a prompt to the user upon reaching the form on the web page). -
Method 600 provides the technical advantage of being able to navigate and populate the plurality of fields of an electronic form in a hands free manner by interacting using a user's voice. The user's voice is used both to receive commands, and also for obtaining the values of the fields of the form. In some embodiments, functionality implemented on the user device (e.g., analysis of the metadata, providing a speech synthesis API, etc.) may be offloaded to backend servers. This provides the technical advantage of less computational load on the processor(s) of the user device.Method 600 also provides several user benefits, such as the ability to populate entire forms in a hands free manner by carrying out a conversation with the user device. In this regard, the speaker of the user device based on instructions from the plug-in may prompt the user for a value of a next field of the plurality of fields without an affirmative input that the current field is populated in order to provide a conversational electronic form. Thus and like a conversation, there is a free flow of movement from one field (one conversation topic) to another field (another conversation topic). This reduces the friction typically experienced when filling out electronic forms. The processes ofmethod 600 recognize the difference between user commands and user speech input, which makes populating the fields of the form easy for the user even when the user makes some mistakes by navigating back to previously populated fields through voice commands. -
Method 600 also provides the benefit of error checking in at least two ways. First, theclient application 136, via the DANDI plug-in 237, provides a visual depiction of the filled field as the field is filled. Second, theclient application 136, via the DANDI plug-in 237, provides a visual depiction of the completed form prior to submission of the form. In some embodiments, theclient application 136, via the DANDI plug-in 237, may read aloud the filled field as the field is populated. In operation, theclient application 136 via the DANDI plug-in 237 may identify an error with respect to a provided field value (alternatively, as described below, theprovider computing system 150 may determine one or more errors). For example, the DANDI plug-in 237 may expect certain values for each field based on the metadata analysis. If the provided information does not match the expected values (e.g., expecting a phone number and a home address value is provided), theclient application 136 via the DANDI plug-in 237 may prompt the user to confirm their submission or request different information. Accordingly and during population of the fields, errors may be determined by the DANDI plug-in 237 with respect to one or more field values. The client application via the DANDI plug-in may compare the received speech input (or converted text) to the expected value for the field(s) of the form and where the expected values do not match the speech input (or converted text), an error is flagged. The DANDI plug-in 237 may then prompt the user to confirm the field values or to change the field values with determined errors. In some embodiments and prior to submission of the form, theclient application 136, via the DANDI plug-in 237, may read aloud each field and each field value. In either situation, a user may receive an auditory and visual indication of the populated field and/or fields of the form. Accordingly, an ability to check for errors is provided by the DANDI plug-in 237 to the user. - In some embodiments and in addition to the error checking provided, form validation and error messaging may be provided. In this way, the
client application 136, via the DANDI plug-in 237 may provide a message to the user to confirm the value of one or more fields of the form. For example, the field may be date-of-birth yet the value depicts the name of the user. The DANDI plug-in 237 may compare the expected field value to the actual field value to determine that a potential error exists. For example, the expected field values are numerical in nature yet the received converted text is alpha characters. As a result, theclient application 136 via the DANDI plug-in 237 may provide a message, such as an audible question, to the user: “The date-of-birth field includes a name and not a date. Would you like to return to this field to change the value?” In another embodiment, the error checking may include automatic correction. With respect to the previous example, the client application via the DANDI plug-in 237 may recognize that the value provided by the user is their name and the field is date-of-birth. Rather than populating the date-of-birth field, the client applicant via the DANDI plug-in 237, using the metadata analysis of the form, locates and populates the name field with the user's name. Then, the client application via the DANDI plug-in audibly prompts the user for their date-of-birth for the date-of-birth field. In this way, a smart form-filling aspect is provided via the DANDI plug-in 237. As a variation to this aspect and using information from the auto-complete circuit 361, may proactively prevent errors as they may occur. Sticking with the above example and knowing that the field is date-of-birth, despite the user providing an audible input of their name, the visually depicted field is their date-of-birth. In this regard, thecircuit 361 provides information regarding the user's date of birth. As the DANDI plug-in 237 compares the user's input (their name) to the field value (date-of-birth), the client application via the DANDI plug-in may disregard the user's voice input in favor of the date-of-birth information from thecircuit 361 because this information matches the required field value. Thus, a proactive error correction feature may be provided. - As mentioned above, a form validation feature may also be provided. This validation provides a holistic error examination process. In one embodiment and after the form is submitted, the form is sent to the
provider computer system 150 rather than the end recipient computing system. Using stored submitted forms associated with the user, thesystem 150 compares the field values to previously-submitted field values. Thesystem 150 may then identify potential errors and either fix them before submitting the form or transmit a message back to the user for potential correction (e.g., a verbal prompt, a written message such as a push notification, etc.). If thesystem 150 determines that the form appears to be correctly filled (e.g., by matching the field values with the required information for each field to ensure a match), then thesystem 150 transmits the form to the end recipient. As another example, an audible prompt may be provided to the user to check the form prior submission via the client application. If the user responds in the affirmative, then the form-to-be-submitted is transmitted to thesystem 150 for validation. This may be beneficial for long and complex forms, such as mortgage forms where additional analysis is desired to ensure no or likely no errors. In another embodiment, a carbon copy of the form and the populated fields is provided to the system by the client application during population of the form. This may enable simultaneous error-checking of the form by thesystem 150. These validation procedures show the potential involvement of thesystem 150 in attempting to mitigate errors in filling of the form. This may be used when such functionality is not included with the DANDI plug-in 237. - As mentioned herein, artificial intelligence, machine learning, and the like may be used by the processing circuit of the
provider computing system 150. The use of artificial intelligence, such as the above-described convolutional neural networks, may also be used in the error-checking process of the electronic form prior to submission. As described herein, a learning by thesystem 150 of the user's typical responses (e.g., home address, favorite pet, etc.) and voice characteristics via artificial intelligence may enable a quicker filling of forms and with less likelihood for errors. -
FIG. 7 is a flowchart depicting amethod 700 of providing refinements to speech input samples by theprovider computing system 150, according to an example embodiment. In some embodiments, the speech input received from a user 120 may be distorted, garbled, attenuated, or irregular in some manner (for example, there are unexpected gaps in the speech input). In some embodiments, the irregularity in the speech input is due to an accent in the speech, or due to the speech input being in a foreign language. - At
process 702, a speech input is received. In particular, a speech input for filling out a field of a plurality of fields of a form is a received. Theclient application 136, via the DANDI plug-in 237, processes the speech input by passing the received speech input to the speech synthesis API. - At
process 704, an irregularity in the speech input is determined. In one embodiment, theclient application 136, via the DANDI plug-in 237, fails to recognize the syllables in the speech input after processing the speech input through the speech synthesis API. Due to the failure in recognizing the syllables in the received speech input, theclient application 136 classifies the speech input as irregular. Theclient application 136 then forwards the speech input to theprovider computing system 150 for refinement of the quality of the speech input. Some of the reasons for the speech input to be irregular may be due to attenuation of the speech input, or due to the presence of background noise, or due to an accent that is hard to recognize. As an example, theprovider computing system 150 may determine that the irregularity is a non-English language speech input. This may be identified by the client application via the DANDI plug-in in that the speech input is not recognized, which causes the client application to transmit the speech input to theprovider computing system 150. The processing circuit of thesystem 150 may then determine that speech input is a non-English language speech input (e.g., via the speech enhancement circuit 359). Then, theprovider computing system 150 may translating (e.g., via the speech enhancement circuit 359) the non-English language speech input into the English language as part of the refinement. Because of the storage capacity of thesystem 150, a translation may be readily accomplished with minimal time by thesystem 150 versus the client application and DANDI plug-in 237. - At
process 706, the speech input is refined. In one embodiment, thespeech enhancement circuit 359 of theprovider computing system 150 processes the received speech input with artificial intelligence (AI) smart algorithms to refine the speech input samples. In some embodiments, the AI smart algorithms look up the historical voice inputs for a user 120 in theprovider database 365 to identify a pattern in the user's 120 speech input, and then use extrapolation to refine the current speech input samples received in irregular form. In another embodiment, thespeech enhancement circuit 359 digitally enhances the speech input samples through filtering and digital processing techniques to obtain better quality samples of the user's 120 speech input to improve the reliability of the recognition of the speech input. In still another embodiment, thespeech enhancement circuit 359 of theprovider computing system 150 leverages stored information for the specific user 120 in aprovider database 365 by analyzing patterns in the user's past speech inputs. Based on the patterns, various refinements to the speech input may be performed. For example, based on known pronunciations of the user's home address, this information may be used to determine that a speech input is regarding the user's home address. - At
process 708, a refined speech output based on the refined speech input is provided. In particular, thespeech enhancement circuit 359 of theprovider computing system 150 is structured to provide the refined speech output back to theclient application 136 of theuser device 130. Then, theprocessing circuit 154 may convert the refined speech input into text (e.g., alpha, numeric, alphanumeric text). Thesystem 150 then provides the text converted from the refined speed input to the user device. Theclient application 136 uses the text from the refined speech output to populating the value of a field in order to fill an electronic form in accord withmethod 600 and the other disclosure contained herein. - The arrangements described herein have been described with reference to drawings. The drawings illustrate certain details of specific arrangements that implement the systems, methods and programs described herein. However, describing the arrangements with drawings should not be construed as imposing on the disclosure any limitations that may be present in the drawings.
- It should be understood that no claim element herein is to be construed under the provisions of 35 U.S.C. § 112(f), unless the element is expressly recited using the phrase “means for.”
- As used herein, the term “circuit” may include hardware structured to execute the functions described herein. In some arrangements, each respective “circuit” may include machine-readable media for configuring the hardware to execute the functions described herein. The circuit may be embodied as one or more circuitry components including, but not limited to, processing circuitry, network interfaces, peripheral devices, input devices, output devices, sensors, etc. In some arrangements, a circuit may take the form of one or more analog circuits, electronic circuits (e.g., integrated circuits (IC), discrete circuits, system on a chip (SOCs) circuits, etc.), telecommunication circuits, hybrid circuits, and any other type of “circuit.” In this regard, the “circuit” may include any type of component for accomplishing or facilitating achievement of the operations described herein. For example, a circuit as described herein may include one or more transistors, logic gates (e.g., NAND, AND, NOR, OR, XOR, NOT, XNOR, etc.), resistors, multiplexers, registers, capacitors, inductors, diodes, wiring, and so on).
- The “circuit” may also include one or more processors communicatively coupled to one or more memory or memory devices. In this regard, the one or more processors may execute instructions stored in the memory or may execute instructions otherwise accessible to the one or more processors. In some arrangements, the one or more processors may be embodied in various ways. The one or more processors may be constructed in a manner sufficient to perform at least the operations described herein. In some arrangements, the one or more processors may be shared by multiple circuits (e.g., circuit A and circuit B may comprise or otherwise share the same processor which, in some example arrangements, may execute instructions stored, or otherwise accessed, via different areas of memory). Alternatively, or additionally, the one or more processors may be structured to perform or otherwise execute certain operations independent of one or more co-processors. In other example arrangements, two or more processors may be coupled via a bus to enable independent, parallel, pipelined, or multi-threaded instruction execution. Each processor may be implemented as one or more general-purpose processors, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), digital signal processors (DSPs), or other suitable electronic data processing components structured to execute instructions provided by memory. The one or more processors may take the form of a single core processor, multi-core processor (e.g., a dual core processor, triple core processor, quad core processor, etc.), microprocessor, etc. In some arrangements, the one or more processors may be external to the apparatus, for example the one or more processors may be a remote processor (e.g., a cloud based processor). Alternatively, or additionally, the one or more processors may be internal and/or local to the apparatus. In this regard, a given circuit or components thereof may be disposed locally (e.g., as part of a local server, a local computing system, etc.) or remotely (e.g., as part of a remote server such as a cloud based server). To that end, a “circuit” as described herein may include components that are distributed across one or more locations.
- An exemplary system for implementing the overall system or portions of the arrangements might include a general purpose computing computers in the form of computers, including a processing unit, a system memory, and a system bus that couples various system components including the system memory to the processing unit. Each memory device may include non-transient volatile storage media, non-volatile storage media, non-transitory storage media (e.g., one or more volatile and/or non-volatile memories), etc. In some arrangements, the non-volatile media may take the form of ROM, flash memory (e.g., flash memory such as NAND, 3D NAND, NOR, 3D NOR, etc.), EEPROM, MRAM, magnetic storage, hard discs, optical discs, etc. In other arrangements, the volatile storage media may take the form of RAM, TRAM, ZRAM, etc. Combinations of the above are also included within the scope of machine-readable media. In this regard, machine-executable instructions comprise, for example, instructions and data which cause a general purpose computer, special purpose computer, or special purpose processing machines to perform a certain function or group of functions. Each respective memory device may be operable to maintain or otherwise store information relating to the operations performed by one or more associated circuits, including processor instructions and related data (e.g., database components, object code components, script components, etc.), in accordance with the example arrangements described herein.
- It should be noted that although the diagrams herein may show a specific order and composition of method steps, it is understood that the order of these steps may differ from what is depicted. For example, two or more steps may be performed concurrently or with partial concurrence. Also, some method steps that are performed as discrete steps may be combined, steps being performed as a combined step may be separated into discrete steps, the sequence of certain processes may be reversed or otherwise varied, and the nature or number of discrete processes may be altered or varied. The order or sequence of any element or apparatus may be varied or substituted according to alternative arrangements. Accordingly, all such modifications are intended to be included within the scope of the present disclosure as defined in the appended claims. Such variations will depend on the machine-readable media and hardware systems chosen and on designer choice. It is understood that all such variations are within the scope of the disclosure. Likewise, software and web implementations of the present disclosure could be accomplished with standard programming techniques with rule based logic and other logic to accomplish the various database searching steps, correlation steps, comparison steps and decision steps.
- The foregoing description of arrangements has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the disclosure to the precise form disclosed, and modifications and variations are possible in light of the above teachings or may be acquired from this disclosure. The arrangements were chosen and described in order to explain the principals of the disclosure and its practical application to enable one skilled in the art to utilize the various arrangements and with various modifications as are suited to the particular use contemplated. Other substitutions, modifications, changes and omissions may be made in the design, operating conditions and arrangement of the arrangements without departing from the scope of the present disclosure as expressed in the appended claims.
Claims (20)
1. A method, comprising:
determining, by one or more processing circuits, a plurality of elements of a document;
receiving, by the one or more processing circuits, a first speech input from a user to enable a mode of operation;
authenticating, by the one or more processing circuits, the user by comparing the first speech input from the user with at least one voice sample of the user;
in response to authenticating the first speech input, enabling, by the one or more processing circuits, the mode of operation;
receiving, by the one or more processing circuits in the mode of operation, a second speech input for filling out a first element of the document, wherein the first element is selected based on a priority order;
determining, by the one or more processing circuits, an irregularity or distortion in the second speech input based on the first element and identifying a missing syllable or a distorted syllable in the second speech input, wherein identifying the missing syllable or the distorted syllable comprises executing an analysis of the second speech input, wherein either (1) the missing syllable is determined based on other syllables identified in the analysis, or (2) the distorted syllable is determined based on failing to recognize a syllable in the analysis;
refining, by the one or more processing circuits, the second speech input into at least one matching syllable by extrapolating the missing syllable or the distorted syllable based on stored syllables of a plurality of speech inputs, wherein the at least one matching syllable is determined at least in part on an expected element value associated with the first element;
converting, by the one or more processing circuits, the refined second speech input comprising the at least one matching syllable into text; and
providing, by the one or more processing circuits, the text to a user device to populate the first element with the text.
2. The method of claim 1 , wherein the irregularity is a first irregularity and the method further comprises:
determining, by the one or more processing circuits, a second irregularity that is a non-English language speech input;
identifying, by the one or more processing circuits, a language of the non-English language speech input of the second irregularity; and
translating, by the one or more processing circuits, the non-English language speech input into English language.
3. The method of claim 1 , wherein determining the irregularity in the second speech input is based on identifying the distorted syllable in the second speech input, and the method further comprises:
determining, by the one or more processing circuits, that the distorted syllable is due to at least one of an attenuation of the second speech input, a presence of background noise, or an accent in the second speech input.
4. The method of claim 3 , further comprising:
transmitting, by the one or more processing circuits, the second speech input to a speech enhancement circuit to at least partially mitigate the irregularity in the second speech input.
5. The method of claim 4 , further comprising:
receiving, by the one or more processing circuits, a mitigated speech output from the speech enhancement circuit as a refinement to at least partially mitigate the irregularity in the second speech input.
6. The method of claim 1 , further comprising:
highlighting, by the one or more processing circuits on a display screen of the user device, a second element of the plurality of elements in response to determining that the first element is populated with the text.
7. The method of claim 1 , further comprising:
determining, by the one or more processing circuits, the expected element value for the first element based on metadata;
comparing, by the one or more processing circuits, the second speech input to the expected element value; and
determining, by the one or more processing circuits, that the second speech input does not match the expected element value of the first element.
8. The method of claim 1 , further comprising:
correcting, by the one or more processing circuits, an error in the first element by disregarding the received second speech input for a second value of the first element in favor of information that matches the expected element value of the first element.
9. The method of claim 1 , further comprising:
filtering, by the one or more processing circuits through at least one digital processing technique, the second speech input to remove at least a portion of the irregularity.
10. The method of claim 1 , wherein the refinement of the second speech input comprises:
executing, by the one or more processing circuits, at least one artificial intelligence algorithm to compare each syllable in the second speech input to the stored syllables in a database to find a closest match for the missing syllable or the distorted syllable in the second speech input; and
providing, by the one or more processing circuits, at least one user specific auto-complete suggestion based on information stored in the database associated with the user, wherein the information represents stored values corresponding to multiple elements of documents previously filled by the user.
11. A system, comprising:
one or more processing circuits configured to:
determine a plurality of elements of a document;
receive a first speech input from a user to enable a mode of operation;
authenticate the user by comparing the first speech input from the user with at least one voice sample of the user;
in response to authenticating the first speech input, enable the mode of operation;
receive, in the mode of operation, a second speech input for filling out a first element of the document, wherein the first element is selected based on a priority order;
determine an irregularity or distortion in the second speech input based on the first element and identifying a missing syllable or a distorted syllable in the second speech input, wherein identifying the missing syllable or the distorted syllable comprises executing an analysis of the second speech input, wherein either (1) the missing syllable is determined based on other syllables identified in the analysis, or (2) the distorted syllable is determined based on failing to recognize a syllable in the analysis;
refine the second speech input into at least one matching syllable by extrapolating the missing syllable or the distorted syllable based on stored syllables of a plurality of speech inputs, wherein the at least one matching syllable is determined at least in part on an expected element value associated with the first element;
convert the refined second speech input comprising the at least one matching syllable into text; and
provide the text to a user device to populate the first element with the text.
12. The system of claim 11 , wherein the irregularity is a first irregularity and the one or more processing circuits are further configured to:
determine a second irregularity that is a non-English language speech input;
identify a language of the non-English language speech input; and
translate the non-English language speech input into English language.
13. The system of claim 11 , wherein determining the irregularity in the second speech input is based on identifying the distorted syllable in the second speech input, and the one or more processing circuits are further configured to:
determine that the distorted syllable is due to at least one of an attenuation of the second speech input, a presence of background noise, or an accent in the second speech input.
14. The system of claim 13 , wherein the one or more processing circuits are further configured to:
transmit the second speech input to a speech enhancement circuit to at least partially mitigate the irregularity in the second speech input.
15. The system of claim 14 , wherein the one or more processing circuits are further configured to:
receive a mitigated speech output from the speech enhancement circuit as a refinement to at least partially mitigate the irregularity in the second speech input.
16. The system of claim 11 , wherein the one or more processing circuits are further configured to:
highlight a second element of the plurality of elements in response to determining that the first element is populated with the text.
17. One or more non-transitory computer-readable storage media having instructions stored thereon that, when executed by one or more processing circuits, cause the one or more processing circuits to perform operations comprising:
determining a plurality of elements of a document;
receiving a first speech input from a user to enable a mode of operation;
authenticating the user by comparing the first speech input from the user with at least one voice sample of the user;
in response to authenticating the first speech input, enabling the mode of operation;
receiving, in the mode of operation, a second speech input for filling out a first element of the document, wherein the first element is selected based on a priority order;
determining an irregularity or distortion in the second speech input based on the first element and identifying a missing syllable or a distorted syllable in the second speech input, wherein identifying the missing syllable or the distorted syllable comprises executing an analysis of the second speech input, wherein either (1) the missing syllable is determined based on other syllables identified in the analysis, or (2) the distorted syllable is determined based on failing to recognize a syllable in the analysis;
refining the second speech input into at least one matching syllable by extrapolating the missing syllable or the distorted syllable based on stored syllables of a plurality of speech inputs, wherein the at least one matching syllable is determined at least in part on an expected element value associated with the first element;
converting the refined second speech input comprising the at least one matching syllable into text; and
providing the text to a user device to populate the first element with the text.
18. The one or more non-transitory computer-readable storage media of claim 17 , wherein the instructions, when executed by the one or more processing circuits, further cause the one or more processing circuits to perform operations comprising:
highlighting, on a display screen of the user device, a second element of the plurality of elements in response to determining that the first element is populated with the text.
19. The one or more non-transitory computer-readable storage media of claim 17 , wherein the instructions, when executed by the one or more processing circuits, further cause the one or more processing circuits to perform operations comprising:
determining the expected element value for the first element based on metadata;
comparing the second speech input to the expected element value; and
determining that the second speech input does not match the expected element value of the first element.
20. The one or more non-transitory computer-readable storage media of claim 17 , wherein the instructions, when executed by the one or more processing circuits, further cause the one or more processing circuits to perform operations comprising:
correcting an error in the first element by disregarding the received second speech input for a second value of the first element in favor of information that matches the expected element value of the first element.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/223,429 US20230359812A1 (en) | 2019-10-11 | 2023-07-18 | Digitally aware neural dictation interface |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/600,242 US11308265B1 (en) | 2019-10-11 | 2019-10-11 | Digitally aware neural dictation interface |
US16/660,656 US11868709B1 (en) | 2019-10-11 | 2019-10-22 | Digitally aware neural dictation interface |
US18/223,429 US20230359812A1 (en) | 2019-10-11 | 2023-07-18 | Digitally aware neural dictation interface |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/660,656 Continuation US11868709B1 (en) | 2019-10-11 | 2019-10-22 | Digitally aware neural dictation interface |
Publications (1)
Publication Number | Publication Date |
---|---|
US20230359812A1 true US20230359812A1 (en) | 2023-11-09 |
Family
ID=81187297
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/600,242 Active US11308265B1 (en) | 2019-10-11 | 2019-10-11 | Digitally aware neural dictation interface |
US16/660,656 Active US11868709B1 (en) | 2019-10-11 | 2019-10-22 | Digitally aware neural dictation interface |
US18/223,429 Pending US20230359812A1 (en) | 2019-10-11 | 2023-07-18 | Digitally aware neural dictation interface |
Family Applications Before (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/600,242 Active US11308265B1 (en) | 2019-10-11 | 2019-10-11 | Digitally aware neural dictation interface |
US16/660,656 Active US11868709B1 (en) | 2019-10-11 | 2019-10-22 | Digitally aware neural dictation interface |
Country Status (1)
Country | Link |
---|---|
US (3) | US11308265B1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11682399B2 (en) * | 2020-07-14 | 2023-06-20 | Dell Products L.P. | Dynamic redfish query URI binding from context oriented interaction |
Family Cites Families (37)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5231670A (en) * | 1987-06-01 | 1993-07-27 | Kurzweil Applied Intelligence, Inc. | Voice controlled system and method for generating text from a voice controlled input |
US7942328B2 (en) * | 2000-01-03 | 2011-05-17 | Roelesis Wireless Llc | Method for data interchange |
AU2003225577A1 (en) * | 2002-02-18 | 2003-09-09 | Kirusa, Inc. | A technique for synchronizing visual and voice browsers to enable multi-modal browsing |
US7529408B2 (en) * | 2005-02-23 | 2009-05-05 | Ichannex Corporation | System and method for electronically processing document images |
US20070036290A1 (en) * | 2005-03-02 | 2007-02-15 | Warner Bros. Entertainment Inc. | Voicemail system and related method |
US8437729B2 (en) * | 2005-05-10 | 2013-05-07 | Mobile Communication Technologies, Llc | Apparatus for and system for enabling a mobile communicator |
US20110010087A1 (en) * | 2005-10-24 | 2011-01-13 | CellTrak Technologies, Inc. | Home Health Point-of-Care and Administration System |
US7996222B2 (en) * | 2006-09-29 | 2011-08-09 | Nokia Corporation | Prosody conversion |
WO2010042954A1 (en) * | 2008-10-10 | 2010-04-15 | Nuance Communications, Inc. | Generating and processing forms for receiving speech data |
US20100217682A1 (en) * | 2009-02-26 | 2010-08-26 | Research In Motion Limited | System and method for capturing user inputs in electronic forms |
EP3091535B1 (en) | 2009-12-23 | 2023-10-11 | Google LLC | Multi-modal input on an electronic device |
US10705794B2 (en) * | 2010-01-18 | 2020-07-07 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
WO2012092709A1 (en) | 2011-01-05 | 2012-07-12 | Google Inc. | Method and system for facilitating text input |
US9645989B2 (en) | 2011-11-04 | 2017-05-09 | Sas Institute Inc. | Techniques to generate custom electronic forms using custom content |
US9064493B2 (en) * | 2012-07-09 | 2015-06-23 | Nuance Communications, Inc. | Detecting potential significant errors in speech recognition results |
US9064492B2 (en) * | 2012-07-09 | 2015-06-23 | Nuance Communications, Inc. | Detecting potential significant errors in speech recognition results |
US8924213B2 (en) * | 2012-07-09 | 2014-12-30 | Nuance Communications, Inc. | Detecting potential significant errors in speech recognition results |
US8909526B2 (en) * | 2012-07-09 | 2014-12-09 | Nuance Communications, Inc. | Detecting potential significant errors in speech recognition results |
US8924211B2 (en) * | 2012-07-09 | 2014-12-30 | Nuance Communications, Inc. | Detecting potential significant errors in speech recognition results |
US9799328B2 (en) * | 2012-08-03 | 2017-10-24 | Veveo, Inc. | Method for using pauses detected in speech input to assist in interpreting the input during conversational interaction for information retrieval |
US9946699B1 (en) | 2012-08-29 | 2018-04-17 | Intuit Inc. | Location-based speech recognition for preparation of electronic tax return |
US9280981B2 (en) * | 2013-02-27 | 2016-03-08 | Blackberry Limited | Method and apparatus for voice control of a mobile device |
US8898063B1 (en) * | 2013-03-15 | 2014-11-25 | Mark Sykes | Method for converting speech to text, performing natural language processing on the text output, extracting data values and matching to an electronic ticket form |
US20150039987A1 (en) | 2013-07-30 | 2015-02-05 | Plum Interfaces, LLC | Systems and methods for data entry |
US9189742B2 (en) | 2013-11-20 | 2015-11-17 | Justin London | Adaptive virtual intelligent agent |
CN106663422B (en) * | 2014-07-24 | 2021-03-30 | 哈曼国际工业有限公司 | Speech recognition system and speech recognition method thereof |
US20160300573A1 (en) * | 2015-04-08 | 2016-10-13 | Google Inc. | Mapping input to form fields |
US20190220727A1 (en) * | 2018-01-17 | 2019-07-18 | SameDay Security, Inc. | Computing Devices with Improved Interactive Animated Conversational Interface Systems |
US10445419B2 (en) | 2016-01-05 | 2019-10-15 | Adobe Inc. | Form filling engine to populate fields of an electronic form |
US10460024B2 (en) * | 2016-01-05 | 2019-10-29 | Adobe Inc. | Interactive electronic form workflow assistant that guides interactions with electronic forms in a conversational manner |
US10242672B2 (en) | 2016-10-28 | 2019-03-26 | Microsoft Technology Licensing, Llc | Intelligent assistance in presentations |
KR20180082043A (en) * | 2017-01-09 | 2018-07-18 | 삼성전자주식회사 | Electronic device and method for connecting communication using voice |
US10762286B2 (en) * | 2017-09-21 | 2020-09-01 | Payformix LLC | Automated electronic form generation |
JP6954821B2 (en) | 2017-12-11 | 2021-10-27 | トヨタ自動車株式会社 | Dialogue management device and program |
US10839454B2 (en) * | 2018-03-13 | 2020-11-17 | Bank Of America Corporation | System and platform for execution of consolidated resource-based action |
US20190303411A1 (en) * | 2018-04-02 | 2019-10-03 | Docblaster Pty Ltd | Pixmap Forms System and Method |
US10937425B2 (en) * | 2019-01-10 | 2021-03-02 | Dell Products L.P. | Systems and methods for selectively activating and interacting with a speech recognition service during application runtime without interrupting execution of the application |
-
2019
- 2019-10-11 US US16/600,242 patent/US11308265B1/en active Active
- 2019-10-22 US US16/660,656 patent/US11868709B1/en active Active
-
2023
- 2023-07-18 US US18/223,429 patent/US20230359812A1/en active Pending
Also Published As
Publication number | Publication date |
---|---|
US11308265B1 (en) | 2022-04-19 |
US11868709B1 (en) | 2024-01-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10423709B1 (en) | Systems, devices, and methods for automated and programmatic creation and deployment of remediations to non-compliant web pages or user interfaces | |
KR102030784B1 (en) | Application integration with a digital assistant | |
US11151304B2 (en) | Modular systems and methods for selectively enabling cloud-based assistive technologies | |
US9946985B2 (en) | Touchless mobile applications and context-sensitive workflows | |
US10223411B2 (en) | Task assistant utilizing context for improved interaction | |
US10460024B2 (en) | Interactive electronic form workflow assistant that guides interactions with electronic forms in a conversational manner | |
US10795528B2 (en) | Task assistant having multiple visual displays | |
US9111546B2 (en) | Speech recognition and interpretation system | |
US20140253455A1 (en) | Task assistant providing contextual suggestions | |
US11372850B2 (en) | Task assistant | |
US11244120B1 (en) | Systems and methods for processing nuances in natural language | |
US20230359812A1 (en) | Digitally aware neural dictation interface | |
US11689484B2 (en) | Dynamic configuration and population of digital interfaces during programmatically established chatbot sessions | |
US11875166B2 (en) | User interface modality switching for transaction management | |
US11343378B1 (en) | Methods, apparatuses, and systems for dynamically navigating interactive communication systems | |
CN116547676A (en) | Enhanced logic for natural language processing | |
US20210312464A1 (en) | Speech recognition ordering system and related methods | |
CN118202344A (en) | Deep learning technique for extracting embedded data from documents | |
US9348988B2 (en) | Biometric authorization for real time access control | |
US9939980B2 (en) | Task assistant including navigation control | |
US20200394733A1 (en) | Systems and methods for mobile device-based legal self help | |
CN116806338A (en) | Determining and utilizing auxiliary language proficiency metrics | |
US20140258855A1 (en) | Task assistant including improved navigation | |
WO2023205132A1 (en) | Machine learning based context aware correction for user input recognition | |
CA3019997A1 (en) | Automated population of deep-linked interfaces during programmatically established chatbot sessions |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: ADVISORY ACTION MAILED |