US20240037673A1 - Voice enabled content tracker - Google Patents

Voice enabled content tracker Download PDF

Info

Publication number
US20240037673A1
US20240037673A1 US18/453,617 US202318453617A US2024037673A1 US 20240037673 A1 US20240037673 A1 US 20240037673A1 US 202318453617 A US202318453617 A US 202318453617A US 2024037673 A1 US2024037673 A1 US 2024037673A1
Authority
US
United States
Prior art keywords
attachment
type
user
tax
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US18/453,617
Inventor
Sangeetha Uthamalingam SANTHARAM
Bridget Diane Kimball
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intuit Inc
Original Assignee
Intuit Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intuit Inc filed Critical Intuit Inc
Priority to US18/453,617 priority Critical patent/US20240037673A1/en
Assigned to INTUIT, INC. reassignment INTUIT, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: KIMBALL, BRIDGET DIANE, SANTHARAM, SANGEETHA UTHAMALINGAM
Publication of US20240037673A1 publication Critical patent/US20240037673A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/12Accounting
    • G06Q40/123Tax preparation or submission
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/10Tax strategies
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding

Definitions

  • aspects of the present disclosure relate to natural language processing, and more specifically to techniques for automatically detecting, tracking, and processing certain information content (e.g., tax expense information), based on analysis of voice content from a user.
  • information content e.g., tax expense information
  • tax preparation applications generally rely on the user to manually store and track tax expense information over time (e.g., throughout the year) and input the tax expense information into the tax preparation application when the user is in the process of preparing a tax return.
  • tax preparation applications generally rely on the user to manually store and track tax expense information over time (e.g., throughout the year) and input the tax expense information into the tax preparation application when the user is in the process of preparing a tax return.
  • many users find it challenging to keep track of expense information over time. For example, users may not be able to remember when the expense occurred, may not know whether the expense qualifies as a tax expense, and/or may not be able to determine which expenses will lead to a tax benefit.
  • Certain embodiments provide a method.
  • the method generally includes receiving natural language content corresponding to audio input from a user.
  • the method also includes determining whether the natural language content comprises a first type of information, based on evaluating the natural language content with a first machine learning model.
  • the method also includes, in response to determining the natural language content comprises the first type of information: determining a temporal association of the first type of information, based on evaluating the natural language content with a second machine learning model, and transmitting to the user a message comprising an indication of the temporal association of the first type of information.
  • the method generally includes receiving natural language content corresponding to audio input from a user.
  • the method also includes determining whether the natural language content comprises tax expense information, based on evaluating the natural language content with a first machine learning model.
  • the method also includes, in response to determining the natural language content comprises the tax expense information: determining a type of the tax expense information based on the evaluation; determining which tax filing period of a plurality of tax filing periods are associated with the type of the tax expense information, based on evaluating the natural language content with a second machine learning model; and automatically processing the type of the tax expense information on behalf of the user.
  • Automatically processing the type of the tax expense information includes associating the type of the tax expense information with at least one content item, based on the tax filing period associated with the type of the tax expense information, and storing the type of the tax expense information in a storage system in association with the at least one content item.
  • the method also includes presenting the type of the tax expense information and the associated tax filing period on a user interface of a computing device associated with the user.
  • the method further includes providing one or more elements within the user interface that allow the user to at least one of (i) verify the tax filing period associated with the type of tax expense information, (ii) modify the tax filing period associated with the type of tax expense information, or (iii) remove the type of tax expense information from the at least one content item.
  • processing systems configured to perform the aforementioned methods as well as those described herein; non-transitory, computer-readable media comprising instructions, when executed by one or more processors of a processing system, cause the processing system to perform the aforementioned methods as well as those described herein; a computer program product embodied on a computer readable storage medium comprising code for performing the aforementioned methods as well as those further described herein; and a processing system comprising means for performing the aforementioned methods as well as those further described herein.
  • FIG. 1 depicts an example computing environment in accordance with certain embodiments.
  • FIG. 2 further illustrates certain components of the computing environment depicted in FIG. 1 , in accordance with certain embodiments.
  • FIG. 3 illustrates an example flowchart for perform content tracking based on audio input from a user, in accordance with certain embodiments.
  • FIG. 4 illustrates an example computing system that performs content tracking based on audio input from a user, in accordance with certain embodiments.
  • aspects of the present disclosure provide apparatuses, methods, processing systems, and computer readable mediums for automatically detecting and tracking certain information content (e.g., tax expense information), based on analysis of voice content.
  • certain information content e.g., tax expense information
  • tax preparation applications generally rely on users (e.g., customers) to identify, store, and track tax expense information over time and input the information into a tax preparation application during the process of filing a tax return. This significant amount of user involvement can reduce user satisfaction with tax preparation applications, which in turn can lead to users discontinuing use of such applications.
  • an online service (including one or more computing systems) employs voice recognition software to process a user's voice/speech.
  • a user can interact with a software application (provided by the online service) using a voice-user interface (VUI).
  • VUI voice-user interface
  • the user can voice/speech to initiate services or processes in the software application, navigate application content provided by the software application, etc.
  • a user can use voice/speech to interact with an interactive workflow (provided by the tax preparation application) in order to prepare and file a tax return, search application content, navigate application features, and the like.
  • the online service can capture the user's voice/speech as an audio stream (e.g., audio input) and process the audio stream to extract text content from the audio stream.
  • an online service can obtain natural language content corresponding to audio input of a user (e.g., the user's voice/speech).
  • the natural language content may be obtained from a computing device (or system) associated with the user.
  • the computing device may capture the user's audio input via a VUI, generate natural language content corresponding to the user's audio input using one or more natural language processing algorithms, and transmit the natural language content to the online service.
  • the online service may generate (at the online service) the natural language content corresponding to audio input of a user (e.g., the user's voice/speech).
  • the online service may receive audio input of a user from a computing device and generate natural language content corresponding to the user's audio input, using one or more natural language processing algorithms.
  • the computing device that captures the user's audio input and/or generates the natural language content may be a mobile device (e.g., smartphone, tablet, personal digital assistant (PDA), etc.) equipped with a VUI, a wearable device (e.g., smartwatch, smart glasses, etc.) equipped with a VUI, a smart home device (e.g., smart speaker, smart television, game console, etc.) equipped with a VUI, etc.
  • a mobile device e.g., smartphone, tablet, personal digital assistant (PDA), etc.
  • PDA personal digital assistant
  • a wearable device e.g., smartwatch, smart glasses, etc.
  • a smart home device e.g., smart speaker, smart television, game console, etc.
  • the online service can determine whether the natural language content includes a particular type of information (e.g., tax expense information, including a type of the tax expense information) based on evaluating the natural language content with a machine learning (ML) model.
  • ML machine learning
  • the ML model may be trained to output an indication of a type of tax expense information from multiple different types of tax expense information, based on the natural language content.
  • tax expense information examples include, but are not limited to, charitable donations, child care expenses, rental home expenses (including repairs, renovations, etc.), education costs (e.g., tuition, books, printing, etc.), moving costs (e.g., truck rental, packaging, etc.), home buying costs, business related tax expenses (e.g., equipment costs, etc.), self-employed tax expenses (e.g., work from home costs, vehicle and mileage expenses, etc.), and the like.
  • the online service may determine that the statement should be classified as a charitable donation and may determine that the amount of the charitable donation is “$200.”
  • the online service can determine a temporal association of the type of information, based on evaluating the natural language content with another ML model.
  • this ML model may be trained to output an indication of tax filing period associated with the type of information (e.g., type of tax expense information), based at least in part on the natural language content and/or metadata associated with the natural language content.
  • the online service may determine the associated tax filing period based on whether the user has completed taxes for the current year and based on when the charitable donation occurred. For instance, if the user has completed taxes for the current tax filing period, the online service may determine that the charitable donation should be associated with the next tax filing period. On the other hand, if the user has not completed taxes for the current tax filing period, the online service may determine the associated tax filing period based on information indicating when the charitable donation occurred.
  • the online service can prompt the user to input the information indicating when the tax expense (e.g., charitable donation) occurred.
  • the online service can generate and send a link to the user allowing the user to upload the information (e.g., receipt for the charitable donation).
  • the online service may generate a query asking the user for the information. The query may be sent to the user via the user's computing device and may be a voice query, a message presented on the computing device's user interface, etc.
  • the online service may determine the information based on the natural language content (e.g., the natural language content may include an indication of when the tax expense occurred).
  • the online service can automatically process the type of information based on the temporal association. Continuing with the above charitable donation example, the online service can automatically apply the charitable donation to a tax return for the associated tax filing period.
  • the online service may also generate and transmit a message to the user (e.g., via the user's computing device) indicating that the type of information has been successfully processed (e.g., the tax expense information has been successfully applied to the user's tax return for the associated tax filing period).
  • the message may also include a link that allows the user to upload at least one content item (e.g., tax receipt and other documentation) associated with the type of information.
  • the online service may generate and transmit a message to the user (e.g., via the user's computing device) indicating that the type of information is not valid (e.g., not a valid tax expense) and that the type of information has not been processed (e.g., the type of information has not been applied to the user's tax return).
  • a message e.g., via the user's computing device
  • the online service may present the type of information along with the temporal association for the type of information on a user interface of the user's computing device.
  • the online service may present an indication of the tax expense information associated with that tax filing period.
  • the online service may present one or more elements that allow the user to confirm applying the tax expense information to the tax return, verify the temporal association, modify the temporal association, delete (or remove) the tax expense information without applying it to the tax return, etc.
  • embodiments herein can track and process tax expense information at a point of need (e.g., substantially at or near when the tax expense occurs) over time (e.g., throughout the year), as opposed to relying on the user to save and remember a substantial amount of information over time and input the information during the process of preparing a tax return.
  • embodiments significantly improve user experience with tax preparation applications, which can increase the retention of users of the online service.
  • an online tax-preparation service as an example of computer software and an online software service, etc., that automatically detects, tracks, and processes information content (e.g., tax expense information) for a user based on the user's voice input.
  • information content e.g., tax expense information
  • One of ordinary skill in the art will recognize that the techniques described herein may be adapted for use by a broad variety of software applications, online or web services, software features, or support services where users can use voice/speech to interact with the software.
  • the computing system that obtains natural language content (corresponding to audio input from the user) is described as performing certain tasks (e.g., determining a type of tax expense information, determining a temporal association for the type of tax expense information, etc.), such tasks may be performed by one or more additional computing devices that the computing system that obtains the natural language content is in communication with (e.g., via a wired or wireless network), or vice versa.
  • the audio input, control signals, and information for display may be communicated between the computing devices, such as described with respect to FIG. 1 .
  • FIG. 1 illustrates an example computing environment 100 in accordance with certain embodiments of the present disclosure.
  • computing environment 100 includes a client device 110 , one or more computing devices 130 , and a computing service 120 , which are interconnected via a network 140 , such as a local area network, personal area network, cellular network, or wide area network (e.g., the Internet).
  • network 140 such as a local area network, personal area network, cellular network, or wide area network (e.g., the Internet).
  • a user may use the client device 110 to interact with the computing service 120 and/or other computing systems (or devices) (e.g., computing device(s) 130 ). Similarly, a user may use the computing device(s) 130 to interact with the computing service 120 and/or other computing systems (or devices) (e.g., client device 110 ).
  • Client device 110 is representative of a variety of computing devices, including, for example, a personal computer, a laptop, a tablet computer, a notebook, a personal digital assistant (PDA), a mobile device (e.g., smartphone), smartwatch, smartglasses, smart television, or any other computing device capable of communicating with the computing service 120 and/or computing device(s) 130 across network 140 .
  • the client device 110 includes a user interface 102 and a browser 104 . The client device 110 may access the browser 104 via the user interface 102 .
  • Computing device(s) 130 is representative of a variety of computing devices capable of communicating with the computing service 120 and/or client device 110 .
  • the computing device 130 is a smart home device, examples of which include, but are not limited to, smart speakers, smartlock, smart doorbell, smart thermostat, smart keychain, etc.
  • one or more of the computing device(s) 130 may include a user interface and/or a browser for interacting with the computing service 120 and/or controlling one or more features of the computing device 130 .
  • the client device 110 and/or the computing device(s) 130 can be used to access web and application services provided by the computing service 120 .
  • the computing service 120 provides a web application 122 .
  • the browser 104 of client device 110 can communicate with the computing service 120 to render web pages from the web application 122 .
  • one or more of the computing device(s) 130 that include browsers can communicate with the computing service 120 to render web pages from the web application 122 .
  • one or more of the computing device(s) that do not include browsers can communicate with the computing service 120 to provide application content from the web application 122 to a user.
  • the application content may be rendered via voice, messages transmitted to the client device 110 , etc.
  • the computing device(s) 130 and client device 110 can use any number of communication methods to communicate with the computing service 120 , including remote procedure calls, application programming interface (API) calls, Simple Object Access Protocol (SOAP)-based web services, remote file access, proprietary client-server architectures, and the like.
  • Functionality attributed to the web application 122 may be applied using code executed by the computing service 120 , code executed by the client device 110 via the browser 104 , code executed by the computing device(s) 130 , or a combination of both server-side code and client-side code.
  • a standalone dedicated application that executes on the client device 110 may be used instead of the browser 104 .
  • Computing service 120 can include a physical computing system(s) and/or virtual server device(s) (for example, in a data center) that execute online applications, including one or more services that perform functions associated with online applications (e.g., web application 122 ).
  • the computing service 120 can be modeled as a web server(s), application server(s), and a database(s) that provide the service back-end for the web application 122 .
  • the web server(s) may communicate with the application server(s) to respond to requests from the browser 104 on the client device 110 and/or requests from the computing device 130 .
  • the web server(s) and/or application server(s) may retrieve application content from the database(s) to respond to requests from the browser 104 (and/or computing device 130 ) and/or store application content into the database(s).
  • the application server(s) may execute a number of components (also referred to as modules) to provide web-based and other content to the client device 110 and/or computing device(s) 130 .
  • the application server(s) include application content (e.g., graphical user interface (GUI) components) that the computing service 120 can present on the user interface 102 of client device 110 and/or a user interface (not shown) on a computing device 130 .
  • the application content that is presented may be based on a user's (e.g., customer, support agent, etc.) interaction with the web application 122 provided by the computing service 120 .
  • the web application 122 may be used to prepare and file tax return documents and may present application content which guides a user through preparing a tax return as well as provide the user with access to a variety of related features (e.g., customer support).
  • the web application 122 may be a financial management application that allows users to track and manage finances, monitor credit reports, set up bank accounts, build portfolios, trade shares, etc.
  • the application content can include, for example, HyperText Markup Language (HTML) components or code that generates HTML components that can be passed to client device 110 and rendered as a user interface.
  • the application content may additionally include instructions executable by client device 110 to display on the user interface 102 using language-specific or operating systems-specific application content (e.g., instructions for generating/displaying javascript based components or similar components on other operating system platforms, Abstract Window Toolkit or Swing API components on the Java platform, and so on.).
  • HTML HyperText Markup Language
  • the application content may additionally include instructions executable by client device 110 to display on the user interface 102 using language-specific or operating systems-specific application content (e.g., instructions for generating/displaying javascript based components or similar components on other operating system platforms, Abstract Window Toolkit or Swing API components on the Java platform, and so on.).
  • instructions capable of rendering application content on client device 110 may include computer executable code generated from compiling and/or interpreting C (or variants thereof), Java, PHP, Ruby, HTML, javascript, Python, AJAX, VBscript, and other programming or scripting languages used to compose and present application content.
  • the computing service 120 allows users to interact with the computing service 120 (including web application 122 ) using voice/speech (e.g., audio input).
  • the computing device(s) 130 includes a VUI 108 and the client device 110 includes a VUI 106 .
  • the VUIs 106 , 108 are generally configured to capture a user's voice/speech (e.g., audio input), such as via a microphone coupled to or integrated with the client device 110 and/or computing device(s) 130 .
  • the VUIs 106 , 108 may transmit the audio input, via the network 140 , to the computing service 120 .
  • the computing service 120 may determine natural language content corresponding to the audio input. For example, the computing service 120 may identify text (e.g., words, query, etc.) of the audio input of the voice/speech of the user (e.g., by using natural language processing techniques to convert speech to text).
  • text e.g., words, query, etc.
  • the computing service 120 may identify text (e.g., words, query, etc.) of the audio input of the voice/speech of the user (e.g., by using natural language processing techniques to convert speech to text).
  • the VUIs 106 , 108 may determine natural language content corresponding to the audio input and transmit the natural language content to the computing service 120 .
  • the VUIs 106 , 108 may identify text (e.g., words, query, etc.) of the audio input of the voice/speech of the user (e.g., by using natural language processing techniques to convert speech to text).
  • the computing service 120 is configured to automatically detect, track, and process certain information content (e.g., tax expense information), based on the user's voice input.
  • the user's voice input may include a query to process a particular type of tax expense of the user.
  • the user may request the computing service 120 to “Add a charitable donation of $100 to Charity X on Date Y to the user's tax return.”
  • the user may request the computing service 120 to “Process a child care payment to Daycare A on Date Z.”
  • the user may state to the computing service 120 that “My vehicle expenses for work today totaled $50.” Note, however, that these are merely reference examples of a user using voice/speech to interact with the computing service 120 .
  • the computing service 120 may process the natural language content to automatically detect, track, and process certain information content (e.g., tax expense information) that may be included within the natural language content.
  • the computing service 120 includes a content tracker module 124 , which is configured to implement one or more techniques described herein to detect, track, and process information content, based on the user's voice input.
  • the content tracker module 124 can include software components, hardware components, or combinations thereof. Note, the content tracker module 124 is described in more detail below.
  • FIG. 1 illustrates a reference example of a computing environment in which the techniques described herein may be implemented and that the techniques described herein may be implemented in other computing environments consistent with the functionality described herein.
  • FIG. 2 further illustrates certain components (e.g., content tracker module 124 ) of the computing environment depicted in FIG. 1 , in accordance with certain embodiments.
  • the content tracker module 124 includes a detection tool 210 , an analysis tool 220 , a content processing tool 230 , a notification tool 240 , and one or more storage systems 250 , each of which can include hardware components, software components, or combinations thereof.
  • the detection tool 210 is generally configured to determine whether natural language content 202 (corresponding to audio input of a user) includes a particular type of information, such as tax expense information.
  • the detection tool 210 receives natural language content 202 from a computing device (e.g., client device 110 and/or computing device(s) 130 ,) which captures audio input from the user.
  • the detection tool 210 evaluates (or analyzes) the natural language content 202 using a ML model 204 to determine a type of tax expense information 206 (or tax expense information type) from multiple types of tax expense information (e.g., charitable donations, child care expenses, rental home expenses, business related expenses, self-employed expenses, etc.).
  • the ML model 204 may be trained on multiple different natural language variations for each type of tax expense information.
  • the analysis tool 220 is generally configured to determine a tax filing period 212 for the tax expense information type 206 . As shown in FIG. 2 , the analysis tool 220 evaluates the natural language content 202 with a ML model 208 to determine the tax filing period 212 associated with the tax expense information type 206 .
  • the tax filing period 212 may be one of multiple tax filing periods.
  • the ML model 208 is configured to evaluate the natural language content 202 to determine an occurrence of the type of tax expense information 206 .
  • the natural language content 202 may include an indication of when the tax expense occurred (e.g., the natural language content 202 may indicate that the tax expense occurred on particular date).
  • the ML model 208 is configured to evaluate the natural language content 202 and user metadata 222 to determine an occurrence of the type of tax expense information 206 .
  • the user metadata 222 may include a history of completed tax returns for different tax filing periods for the user.
  • the user metadata 222 may indicate whether the user has completed a tax return for the current tax filing period.
  • the ML model 208 may determine the associated tax filing period based on whether the user has completed a tax return for the current tax filing period.
  • the ML model 208 may determine that the tax expense information type 206 is associated with a next (or subsequent) tax filing period.
  • the ML model 208 may determine whether the tax expense information type 206 is associated with the current tax filing period or next tax filing period, based on information indicating when the tax expense occurred.
  • the analysis tool 220 may interact with the notification tool 240 to prompt the user for information indicating when the tax expense information type 206 occurred.
  • the notification tool 240 may generate a message 214 including at least one of a query for the information indicating when the tax expense information type 206 occurred or a link allowing the user to upload information indicating when the tax expense information type 206 occurred.
  • the notification tool 240 may transmit the message 214 to a computing device association with the user (e.g., client device 110 ).
  • the content processing tool 230 is generally configured to automatically process the tax expense information type 206 based on the associated tax filing period 212 .
  • the content processing tool 230 may obtain (or determine) a tax return that is associated with the tax filing period 212 and may store the tax expense information type 206 (along with other associated information from the natural language content 202 , such as the amount of the tax expense information type 206 , an occurrence of the tax expense information type 206 , etc.) in a storage system 250 in association with the respective tax return corresponding to the tax filing period 212 .
  • the notification tool 240 is generally configured to generate and transmit an indication of whether the tax expense information type 206 has been processed (e.g., added to a user's relevant tax return). As shown, the notification tool 240 generates one or more messages 214 . In certain embodiments, the notification tool 240 may generate a message 214 indicating that the tax expense information type 206 has been successfully processed. In these embodiments, the message 214 may include a link allowing the user to upload at least one content item (e.g., receipt, documentation, etc.) associated with the tax expense information type 206 . The notification tool 240 may retrieve at least one content item uploaded by the user and may store the at least one content item into a storage system 250 .
  • the notification tool 240 may retrieve at least one content item uploaded by the user and may store the at least one content item into a storage system 250 .
  • the notification tool 240 may generate a message 214 indicating that the natural language content 202 does not include valid tax expense information.
  • the content processing tool 230 is also configured to present an indication of the tax expense information type 206 (along with the associated tax filing period 212 ) to the user via the user's computing device (e.g., client device 110 ). For example, the content processing tool 230 may retrieve the tax expense information type 206 from the storage system 250 where it is stored in association with a corresponding tax return based on the associated tax filing period 212 .
  • the content processing tool 230 may provide one or more elements allowing the user to confirm adding the tax expense information type 206 to the user's tax return, remove the tax expense information type 206 from the user's tax return, verify the tax filing period 212 , modify the tax filing period 212 , etc.
  • FIG. 2 illustrates a reference example of components of a content tracker module 124 and that the content tracker module 124 may have different set of components consistent with the functionality described herein.
  • FIG. 2 describes the content tracker module 124 receiving natural language content 202 corresponding to a user's audio input
  • the content tracker module 124 may be configured to receive the user's audio input and generate natural language content 202 corresponding to the received audio input.
  • FIG. 3 illustrates an exemplary flowchart 300 for automatically detecting, tracking, and processing certain information content based on audio input from a user in accordance with certain aspects of the present disclosure.
  • Flowchart 300 can be implemented by the computing service 120 of FIG. 1 .
  • FIG. 3 illustrates the functionality and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure.
  • each block in the flowchart may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s).
  • the functions noted in the block may occur simultaneously, substantially concurrently, or out of the order noted in the figure.
  • Each block of the flowchart and combinations of blocks in the flowchart can be implemented by special-purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
  • the computing service 120 obtains natural language content (e.g., natural language content 202 ) corresponding to audio input from a user.
  • the natural language content may include a request to process a certain type of information (e.g., tax expense information type 206 ).
  • the computing service 120 may receive the natural language content from a computing device associated with the user (e.g., client device 110 , computing device(s) 130 ).
  • the natural language content may be received via at least one application programming interface (API) associated with the computing device.
  • API application programming interface
  • the computing service 120 may generate the natural language content (at the computing service 120 ).
  • the computing service 120 may receive audio input from the user and use one or more natural language processing techniques to generate the natural language content.
  • the computing service 120 determines whether the natural language content includes a first type of information (e.g., tax expense information type 206 ).
  • the computing service 120 may evaluate the natural language content with a first ML model (e.g., ML model 204 ) to determine whether the natural language content includes the first type of information. If the natural language content does not include the first type of information (e.g., the natural language content may include a second type of information such as non tax expense information), then, at block 306 , the computing service 120 generates and transmits a message indicating that the natural language content does not include the first type of information.
  • a first ML model e.g., ML model 204
  • the computing service 120 determines a temporal association (e.g., tax filing period 212 ) of the first type of information.
  • the first type of information includes a first type of tax expense information (e.g., tax expense information type 206 ) of multiple types of tax expense information.
  • determining the temporal association includes determining which tax filing period (e.g., tax filing period 212 ) of multiple tax filing periods is associated with the first type of tax expense information.
  • the computing service 120 process the first type of information, based on the temporal association.
  • the computing service 120 processes the first type of information by associating the first type of information with at least one content item (e.g., tax return), based on the temporal association, and storing the first type of information in a storage system (e.g., storage system 250 ) in association with the at least one content item.
  • a content item e.g., tax return
  • the computing service 120 generates and transmits a message (e.g., message 214 ) indicating that the first type of information has been processed.
  • the message includes a link that allows the user to upload at least one content item (e.g., receipt, documentation of the tax expense, etc.) associated with the first type of information to a storage system.
  • the flowchart 300 may further include retrieving the at least one content item from the storage system and storing the at least one content item in association with the first type of information in a storage system (e.g., storage system 250 ), based on the temporal association of the first type of information.
  • the flowchart 300 may further include processing the first type of information, based on the temporal association of the first type of information and the at least one content item.
  • processing the first type of information may include verifying the temporal association (e.g., tax filing period 212 ) of the first type of information based on the at least one content item.
  • the computing service 120 presents the first type of information and the temporal association on a computing device associated with the user.
  • the first type of information along with the temporal association may be presented on a user interface (e.g., user interface 102 ) of the computing device (e.g., client device 110 ).
  • the computing service 120 may also provide one or more elements within the user interface that allow the user to verify the temporal association of the first type of information, modify the temporal association of the first type of information, confirm adding the first type of information to the user's tax return, or remove the first type of information from the user's tax return.
  • FIG. 4 illustrates an exemplary computing system 400 that performs content tracking based on audio input from a user in accordance with aspects of the present disclosure.
  • the computing system 400 is a representative example of computing service 120 depicted in FIG. 1 .
  • the computing system 400 includes a central processing unit (CPU) 402 , one or more I/O device interfaces 404 , which may allow for the connection of various I/O devices 414 (e.g., keyboards, displays, mouse devices, pen input, etc.) to computing system 400 , network interface 406 (which may include a transmitter for transmitting data and a receiver for receiving data), memory 408 , storage 410 , and interconnect 412 .
  • Storage 410 can be located either inside or outside the computing system 400 . When the storage is located outside computing system 400 , the storage can be connected via the network 140 .
  • CPU 402 may retrieve and execute programming instructions stored in the memory 408 . Similarly, CPU 402 may retrieve and store application data residing in the memory 408 . Interconnect 412 transmits programming instructions and application data, among the CPU 402 , I/O device interface 404 , network interface 406 , memory 408 , and storage 410 .
  • CPU 402 can represent a single CPU, multiple CPUs, a single CPU having multiple processing cores, and the like.
  • memory 408 may represent volatile memory, such as random access memory.
  • storage 410 may represent non-volatile memory, such as memory on a disk drive.
  • storage 410 may be a combination of fixed or removable storage devices, such as fixed disc drives, removable memory cards or optical storage, network attached storage (NAS), or a storage area-network (SAN).
  • memory 408 stores programming instructions for executing web application 122 and content tracker module 124 , described in more detail above.
  • Storage 410 stores natural language content 202 , tax expense information type 206 , and tax filing period 212 , described in more detail above.
  • a method comprising: receiving natural language content corresponding to audio input from a user; determining whether the natural language content comprises a first type of information, based on evaluating the natural language content with a first machine learning model; and in response to determining the natural language content comprises the first type of information: determining a temporal association of the first type of information, based on evaluating the natural language content with a second machine learning model; and transmitting to the user a message comprising an indication of the temporal association of the first type of information.
  • Clause 2 The method of Clause 1, further comprising processing the first type of information, based on the temporal association of the first type of information, wherein processing the first type of information comprises: associating the first type of information with at least one content item, based on the temporal association; and storing the first type of information in a storage system in association with the at least one content item.
  • Clause 3 The method of any one of Clauses 1 through 2, wherein the message further comprises an indication that the first type of information has been processed.
  • Clause 4 The method of any one of Clauses 1 through 3, wherein: the first type of information comprises tax expense information; and determining the temporal association of the first type of information comprises determining which tax filing period of a plurality of tax filing periods is associated with the tax expense information.
  • Clause 5 The method of any one of Clauses 1 through 4, wherein: the first type of information comprises a first type of tax expense information of a plurality of types of tax expense information; and determining the temporal association of the first type of information comprises determining which tax filing period of a plurality of tax filing periods is associated with the first type of tax expense information.
  • Clause 6 The method of Clause 5, wherein the at least one content item comprises a tax return corresponding to the tax filing period.
  • Clause 7 The method of any one of Clauses 1 through 6, wherein: the natural language content comprises a request to process the first type of information; and the method further comprises, in response to determining that the natural language content comprises a second type of information, transmitting to the user a message comprising an indication that the second type of information is not valid and that the second type of information has not been applied to at least one content item associated with the user.
  • Clause 8 The method of any one of Clauses 1 through 7, further comprising presenting the first type of information and the temporal association of the first type of information on a user interface of a computing device associated with the user.
  • Clause 9 The method of Clause 8, further comprising providing one or more elements within the user interface that allow the user to at least one of (i) verify the temporal association of the first type of information, (ii) modify the temporal association of the first type of information, or (iii) remove the first type of information from at least one content item associated with the user.
  • Clause 10 The method of any one of Clauses 1 through 9, wherein: the message further comprises a link that allows the user to upload at least one content item associated with the first type of information to a first storage system, and the method further comprises: receiving the at least one content item from the first storage system; and storing the at least one content item in association with the first type of information in a second storage system, based on the temporal association of the first type of information.
  • Clause 11 The method of Clause 10, further comprising processing the first type of information, based on the temporal association of the first type of information and the at least one content item, wherein processing the first type of information comprises verifying the temporal association of the first type of information based on the at least one content item.
  • Clause 12 The method of any one of Clauses 1 through 11, wherein the natural language content is received via an application programming interface (API) associated with a computing system.
  • API application programming interface
  • Clause 13 The method of Clause 12, wherein the computing system comprises a smart home device or a mobile device.
  • a method for automatically tracking tax expense information based on audio input from a user comprising: receiving natural language content corresponding to audio input from a user; determining whether the natural language content comprises tax expense information, based on evaluating the natural language content with a first machine learning model; in response to determining the natural language content comprises the tax expense information: determining a type of the tax expense information based on the evaluation; determining which tax filing period of a plurality of tax filing periods are associated with the type of the tax expense information, based on evaluating the natural language content with a second machine learning model; and automatically processing the type of the tax expense information on behalf of the user, comprising: associating the type of the tax expense information with at least one content item, based on the tax filing period associated with the type of the tax expense information; and storing the type of the tax expense information in a storage system in association with the at least one content item; presenting the type of the tax expense information and the associated tax filing period on a user interface of a computing device associated with the user; and providing one
  • a system comprising: a memory having executable instructions stored thereon; and a processor configured to execute the executable instructions to cause the system to: receive natural language content corresponding to audio input from a user; determine whether the natural language content comprises a first type of information, based on evaluating the natural language content with a first machine learning model; and in response to determining the natural language content comprises the first type of information: determine a temporal association of the first type of information, based on evaluating the natural language content with a second machine learning model; and transmit to the user a message comprising an indication of the temporal association of the first type of information.
  • Clause 16 The system of Clause 15, wherein the processor is further configured to execute the executable instructions to cause the system to process the first type of information, based on the temporal association of the first type of information, wherein processing the first type of information comprises: associating the first type of information with at least one content item, based on the temporal association; and storing the first type of information in a storage system in association with the at least one content item.
  • Clause 17 The system of any one of Clauses 15 through 16, wherein the message further comprises an indication that the first type of information has been processed.
  • Clause 18 The system of any one of Clauses 15 through 17, wherein: the first type of information comprises tax expense information; and determining the temporal association of the first type of information comprises determining which tax filing period of a plurality of tax filing periods is associated with the tax expense information.
  • Clause 19 The system of any one of Clauses 15 through 18, wherein the at least one content item comprises a tax return corresponding to the tax filing period.
  • Clause 20 The system of any one of Clauses 15 through 19, wherein the processor is further configured to execute the executable instructions to cause the system to: present the first type of information and the temporal association of the first type of information on a user interface of a computing device associated with the user; and provide one or more elements within the user interface that allow the user to at least one of (i) verify the temporal association of the first type of information, (ii) modify the temporal association of the first type of information, or (iii) remove the first type of information from at least one content item associated with the user.
  • Clause 21 An apparatus comprising: a memory having executable instructions stored thereon; and a processor configured to execute the executable instructions to cause the apparatus to perform the methods of any one of Clauses 1 through 14.
  • Clause 22 An apparatus comprising: means for performing the methods of any one of Clauses 1 through 14.
  • Clause 23 A computer-readable medium having instructions stored thereon which, when executed by a processor, performs the methods of any one of Clauses 1 through 14.
  • an apparatus may be implemented or a method may be practiced using any number of the aspects set forth herein.
  • the scope of the disclosure is intended to cover such an apparatus or method that is practiced using other structure, functionality, or structure and functionality in addition to, or other than, the various aspects of the disclosure set forth herein. It should be understood that any aspect of the disclosure disclosed herein may be embodied by one or more elements of a claim.
  • exemplary means “serving as an example, instance, or illustration.” Any aspect described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects.
  • a phrase referring to “at least one of” a list of items refers to any combination of those items, including single members.
  • “at least one of: a, b, or c” is intended to cover a, b, c, a-b, a-c, b-c, and a-b-c, as well as any combination with multiples of the same element (e.g., a-a, a-a-a, a-a-b, a-a-c, a-b-b, a-c-c, b-b, b-b-b, b-b-c, c-c, and c-c-c or any other ordering of a, b, and c).
  • determining encompasses a wide variety of actions. For example, “determining” may include calculating, computing, processing, deriving, investigating, looking up (e.g., looking up in a table, a database or another data structure), ascertaining and the like. Also, “determining” may include receiving (e.g., receiving information), accessing (e.g., accessing data in a memory) and the like. Also, “determining” may include resolving, selecting, choosing, establishing and the like.
  • the methods disclosed herein comprise one or more steps or actions for achieving the methods.
  • the method steps and/or actions may be interchanged with one another without departing from the scope of the claims.
  • the order and/or use of specific steps and/or actions may be modified without departing from the scope of the claims.
  • the various operations of methods described above may be performed by any suitable means capable of performing the corresponding functions.
  • the means may include various hardware and/or software component(s) and/or module(s), including, but not limited to a circuit, an application specific integrated circuit (ASIC), or processor.
  • ASIC application specific integrated circuit
  • those operations may have corresponding counterpart means-plus-function components with similar numbering.

Abstract

Certain aspects of the present disclosure provide techniques and systems for automatically detecting, tracking, and processing certain information content, based on voice input from a user. A voice enabled content tracking system receives natural language content corresponding to audio input from a user. A determination is made as to whether the natural language content includes a first type of information, based on evaluating the natural language content with a first machine learning model. In response to determining the natural language content comprises the first type of information, a temporal association of the first type of information is determined, based on evaluating the natural language content with a second machine learning model, and a message including an indication of the temporal association of the first type of information is transmitted to the user.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This Application is a continuation of and hereby claims priority under 35 U.S.C. § 120 to co-pending U.S. patent application Ser. No. 17/876,716, filed Jul. 29, 2022, the contents of which are incorporated herein by reference in their entirety
  • INTRODUCTION
  • Aspects of the present disclosure relate to natural language processing, and more specifically to techniques for automatically detecting, tracking, and processing certain information content (e.g., tax expense information), based on analysis of voice content from a user.
  • Complex software applications and services are frequently made available to users over computer networks. For example, software applications used to prepare and file income tax returns are frequently offered as an online service. In addition to generating tax return documents, these online services typically guide a user through a collection of complex interactive workflows in order to prepare a complete, valid income tax return. Further, in attempt to increase user satisfaction with the application, these services generally attempt to maximize the user's tax refund (e.g., by maximizing tax deductions, tax credits, etc.).
  • One challenge with current tax preparation applications is that it is significantly difficult to consolidate tax expense information in the tax preparation application at a point of need. For example, many tax preparation applications generally rely on the user to manually store and track tax expense information over time (e.g., throughout the year) and input the tax expense information into the tax preparation application when the user is in the process of preparing a tax return. However, many users find it challenging to keep track of expense information over time. For example, users may not be able to remember when the expense occurred, may not know whether the expense qualifies as a tax expense, and/or may not be able to determine which expenses will lead to a tax benefit.
  • Accordingly, there exists a need for an improved approach for tracking information content, such as tax expense information.
  • SUMMARY
  • Certain embodiments provide a method. The method generally includes receiving natural language content corresponding to audio input from a user. The method also includes determining whether the natural language content comprises a first type of information, based on evaluating the natural language content with a first machine learning model. The method also includes, in response to determining the natural language content comprises the first type of information: determining a temporal association of the first type of information, based on evaluating the natural language content with a second machine learning model, and transmitting to the user a message comprising an indication of the temporal association of the first type of information.
  • Other embodiments provide a method for automatically tracking tax expense information based on audio input from a user. The method generally includes receiving natural language content corresponding to audio input from a user. The method also includes determining whether the natural language content comprises tax expense information, based on evaluating the natural language content with a first machine learning model. The method also includes, in response to determining the natural language content comprises the tax expense information: determining a type of the tax expense information based on the evaluation; determining which tax filing period of a plurality of tax filing periods are associated with the type of the tax expense information, based on evaluating the natural language content with a second machine learning model; and automatically processing the type of the tax expense information on behalf of the user. Automatically processing the type of the tax expense information includes associating the type of the tax expense information with at least one content item, based on the tax filing period associated with the type of the tax expense information, and storing the type of the tax expense information in a storage system in association with the at least one content item. The method also includes presenting the type of the tax expense information and the associated tax filing period on a user interface of a computing device associated with the user. The method further includes providing one or more elements within the user interface that allow the user to at least one of (i) verify the tax filing period associated with the type of tax expense information, (ii) modify the tax filing period associated with the type of tax expense information, or (iii) remove the type of tax expense information from the at least one content item.
  • Other embodiments provide processing systems configured to perform the aforementioned methods as well as those described herein; non-transitory, computer-readable media comprising instructions, when executed by one or more processors of a processing system, cause the processing system to perform the aforementioned methods as well as those described herein; a computer program product embodied on a computer readable storage medium comprising code for performing the aforementioned methods as well as those further described herein; and a processing system comprising means for performing the aforementioned methods as well as those further described herein.
  • The following description and the related drawings set forth in detail certain illustrative features of one or more embodiments.
  • DESCRIPTION OF THE DRAWINGS
  • The appended figures depict certain aspects of the one or more embodiments and are therefore not to be considered limiting of the scope of this disclosure.
  • FIG. 1 depicts an example computing environment in accordance with certain embodiments.
  • FIG. 2 further illustrates certain components of the computing environment depicted in FIG. 1 , in accordance with certain embodiments.
  • FIG. 3 illustrates an example flowchart for perform content tracking based on audio input from a user, in accordance with certain embodiments.
  • FIG. 4 illustrates an example computing system that performs content tracking based on audio input from a user, in accordance with certain embodiments.
  • To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the drawings. It is contemplated that elements and features of one embodiment may be beneficially incorporated in other embodiments without further recitation.
  • DETAILED DESCRIPTION
  • Aspects of the present disclosure provide apparatuses, methods, processing systems, and computer readable mediums for automatically detecting and tracking certain information content (e.g., tax expense information), based on analysis of voice content.
  • As noted above, it can be significantly challenging for users to track expenses for tax filing purposes. Many tax preparation applications generally rely on users (e.g., customers) to identify, store, and track tax expense information over time and input the information into a tax preparation application during the process of filing a tax return. This significant amount of user involvement can reduce user satisfaction with tax preparation applications, which in turn can lead to users discontinuing use of such applications.
  • The present disclosure provides systems and techniques for automatically detecting, tracking, and processing certain information content, such as tax expense information, based on voice input from a user. In certain embodiments described herein, an online service (including one or more computing systems) employs voice recognition software to process a user's voice/speech. For example, a user can interact with a software application (provided by the online service) using a voice-user interface (VUI). The user can voice/speech to initiate services or processes in the software application, navigate application content provided by the software application, etc. In the case of an online service that provides a tax preparation application, a user can use voice/speech to interact with an interactive workflow (provided by the tax preparation application) in order to prepare and file a tax return, search application content, navigate application features, and the like. To the extent the user does use their voice to interact with the application, the online service can capture the user's voice/speech as an audio stream (e.g., audio input) and process the audio stream to extract text content from the audio stream.
  • In certain embodiments, an online service can obtain natural language content corresponding to audio input of a user (e.g., the user's voice/speech). For example, the natural language content may be obtained from a computing device (or system) associated with the user. For instance, the computing device may capture the user's audio input via a VUI, generate natural language content corresponding to the user's audio input using one or more natural language processing algorithms, and transmit the natural language content to the online service.
  • In certain embodiments, the online service may generate (at the online service) the natural language content corresponding to audio input of a user (e.g., the user's voice/speech). For example, the online service may receive audio input of a user from a computing device and generate natural language content corresponding to the user's audio input, using one or more natural language processing algorithms. The computing device that captures the user's audio input and/or generates the natural language content may be a mobile device (e.g., smartphone, tablet, personal digital assistant (PDA), etc.) equipped with a VUI, a wearable device (e.g., smartwatch, smart glasses, etc.) equipped with a VUI, a smart home device (e.g., smart speaker, smart television, game console, etc.) equipped with a VUI, etc.
  • Upon obtaining the natural language content, the online service can determine whether the natural language content includes a particular type of information (e.g., tax expense information, including a type of the tax expense information) based on evaluating the natural language content with a machine learning (ML) model. For example, the ML model may be trained to output an indication of a type of tax expense information from multiple different types of tax expense information, based on the natural language content. Examples of different types of tax expense information include, but are not limited to, charitable donations, child care expenses, rental home expenses (including repairs, renovations, etc.), education costs (e.g., tuition, books, printing, etc.), moving costs (e.g., truck rental, packaging, etc.), home buying costs, business related tax expenses (e.g., equipment costs, etc.), self-employed tax expenses (e.g., work from home costs, vehicle and mileage expenses, etc.), and the like. In one reference example, assuming a user makes the statement “I made a $200 donation to Charity A,” the online service may determine that the statement should be classified as a charitable donation and may determine that the amount of the charitable donation is “$200.”
  • Upon determining that the natural language content includes a particular type of information, the online service can determine a temporal association of the type of information, based on evaluating the natural language content with another ML model. In certain embodiments, this ML model may be trained to output an indication of tax filing period associated with the type of information (e.g., type of tax expense information), based at least in part on the natural language content and/or metadata associated with the natural language content.
  • For example, assuming the user's statement is “I made a $200 donation to Charity A,” the online service may determine the associated tax filing period based on whether the user has completed taxes for the current year and based on when the charitable donation occurred. For instance, if the user has completed taxes for the current tax filing period, the online service may determine that the charitable donation should be associated with the next tax filing period. On the other hand, if the user has not completed taxes for the current tax filing period, the online service may determine the associated tax filing period based on information indicating when the charitable donation occurred.
  • In some embodiments, the online service can prompt the user to input the information indicating when the tax expense (e.g., charitable donation) occurred. For example, the online service can generate and send a link to the user allowing the user to upload the information (e.g., receipt for the charitable donation). In another example, the online service may generate a query asking the user for the information. The query may be sent to the user via the user's computing device and may be a voice query, a message presented on the computing device's user interface, etc. In yet another example, the online service may determine the information based on the natural language content (e.g., the natural language content may include an indication of when the tax expense occurred).
  • In certain embodiments, the online service can automatically process the type of information based on the temporal association. Continuing with the above charitable donation example, the online service can automatically apply the charitable donation to a tax return for the associated tax filing period. The online service may also generate and transmit a message to the user (e.g., via the user's computing device) indicating that the type of information has been successfully processed (e.g., the tax expense information has been successfully applied to the user's tax return for the associated tax filing period). In certain embodiments, the message may also include a link that allows the user to upload at least one content item (e.g., tax receipt and other documentation) associated with the type of information.
  • In certain embodiments, when the online service determines that the natural language content does not include a particular type of information (e.g., tax expense information), the online service may generate and transmit a message to the user (e.g., via the user's computing device) indicating that the type of information is not valid (e.g., not a valid tax expense) and that the type of information has not been processed (e.g., the type of information has not been applied to the user's tax return).
  • In certain embodiments, the online service may present the type of information along with the temporal association for the type of information on a user interface of the user's computing device. In one example, when the user accesses the online service to initiate preparation of a tax return for a particular tax filing period, the online service may present an indication of the tax expense information associated with that tax filing period. Along with presenting this information, the online service may present one or more elements that allow the user to confirm applying the tax expense information to the tax return, verify the temporal association, modify the temporal association, delete (or remove) the tax expense information without applying it to the tax return, etc.
  • Advantageously, by automatically detecting, tracking, and processing tax expense information based on a user's voice input, embodiments herein can track and process tax expense information at a point of need (e.g., substantially at or near when the tax expense occurs) over time (e.g., throughout the year), as opposed to relying on the user to save and remember a substantial amount of information over time and input the information during the process of preparing a tax return. As such, embodiments significantly improve user experience with tax preparation applications, which can increase the retention of users of the online service.
  • Note, certain embodiments are described herein using an online tax-preparation service as an example of computer software and an online software service, etc., that automatically detects, tracks, and processes information content (e.g., tax expense information) for a user based on the user's voice input. One of ordinary skill in the art will recognize that the techniques described herein may be adapted for use by a broad variety of software applications, online or web services, software features, or support services where users can use voice/speech to interact with the software. Additionally, it should be noted that although, in certain examples described herein, the computing system that obtains natural language content (corresponding to audio input from the user) is described as performing certain tasks (e.g., determining a type of tax expense information, determining a temporal association for the type of tax expense information, etc.), such tasks may be performed by one or more additional computing devices that the computing system that obtains the natural language content is in communication with (e.g., via a wired or wireless network), or vice versa. For example, the audio input, control signals, and information for display may be communicated between the computing devices, such as described with respect to FIG. 1 .
  • FIG. 1 illustrates an example computing environment 100 in accordance with certain embodiments of the present disclosure. Generally, FIG. 1 and the following description are intended to provide a brief, general description of a suitable computing environment in which the embodiments described herein may be implemented. As shown, computing environment 100 includes a client device 110, one or more computing devices 130, and a computing service 120, which are interconnected via a network 140, such as a local area network, personal area network, cellular network, or wide area network (e.g., the Internet).
  • A user (e.g., customer) may use the client device 110 to interact with the computing service 120 and/or other computing systems (or devices) (e.g., computing device(s) 130). Similarly, a user may use the computing device(s) 130 to interact with the computing service 120 and/or other computing systems (or devices) (e.g., client device 110). Client device 110 is representative of a variety of computing devices, including, for example, a personal computer, a laptop, a tablet computer, a notebook, a personal digital assistant (PDA), a mobile device (e.g., smartphone), smartwatch, smartglasses, smart television, or any other computing device capable of communicating with the computing service 120 and/or computing device(s) 130 across network 140. The client device 110 includes a user interface 102 and a browser 104. The client device 110 may access the browser 104 via the user interface 102.
  • Computing device(s) 130 is representative of a variety of computing devices capable of communicating with the computing service 120 and/or client device 110. In one example, the computing device 130 is a smart home device, examples of which include, but are not limited to, smart speakers, smartlock, smart doorbell, smart thermostat, smart keychain, etc. Although not shown, one or more of the computing device(s) 130 may include a user interface and/or a browser for interacting with the computing service 120 and/or controlling one or more features of the computing device 130.
  • The client device 110 and/or the computing device(s) 130 can be used to access web and application services provided by the computing service 120. As shown, the computing service 120 provides a web application 122. The browser 104 of client device 110 can communicate with the computing service 120 to render web pages from the web application 122. Similarly, one or more of the computing device(s) 130 that include browsers can communicate with the computing service 120 to render web pages from the web application 122. In certain embodiments, one or more of the computing device(s) that do not include browsers can communicate with the computing service 120 to provide application content from the web application 122 to a user. The application content may be rendered via voice, messages transmitted to the client device 110, etc. In general, the computing device(s) 130 and client device 110 can use any number of communication methods to communicate with the computing service 120, including remote procedure calls, application programming interface (API) calls, Simple Object Access Protocol (SOAP)-based web services, remote file access, proprietary client-server architectures, and the like. Functionality attributed to the web application 122 may be applied using code executed by the computing service 120, code executed by the client device 110 via the browser 104, code executed by the computing device(s) 130, or a combination of both server-side code and client-side code. Note that, in certain embodiments, a standalone dedicated application that executes on the client device 110 may be used instead of the browser 104.
  • Computing service 120 can include a physical computing system(s) and/or virtual server device(s) (for example, in a data center) that execute online applications, including one or more services that perform functions associated with online applications (e.g., web application 122). In certain embodiments, the computing service 120 can be modeled as a web server(s), application server(s), and a database(s) that provide the service back-end for the web application 122. In such embodiments, the web server(s) may communicate with the application server(s) to respond to requests from the browser 104 on the client device 110 and/or requests from the computing device 130. The web server(s) and/or application server(s) may retrieve application content from the database(s) to respond to requests from the browser 104 (and/or computing device 130) and/or store application content into the database(s).
  • The application server(s) may execute a number of components (also referred to as modules) to provide web-based and other content to the client device 110 and/or computing device(s) 130. The application server(s) include application content (e.g., graphical user interface (GUI) components) that the computing service 120 can present on the user interface 102 of client device 110 and/or a user interface (not shown) on a computing device 130. The application content that is presented may be based on a user's (e.g., customer, support agent, etc.) interaction with the web application 122 provided by the computing service 120. For example, the web application 122 may be used to prepare and file tax return documents and may present application content which guides a user through preparing a tax return as well as provide the user with access to a variety of related features (e.g., customer support). In another example, the web application 122 may be a financial management application that allows users to track and manage finances, monitor credit reports, set up bank accounts, build portfolios, trade shares, etc.
  • The application content can include, for example, HyperText Markup Language (HTML) components or code that generates HTML components that can be passed to client device 110 and rendered as a user interface. The application content may additionally include instructions executable by client device 110 to display on the user interface 102 using language-specific or operating systems-specific application content (e.g., instructions for generating/displaying javascript based components or similar components on other operating system platforms, Abstract Window Toolkit or Swing API components on the Java platform, and so on.). Generally, instructions capable of rendering application content on client device 110 may include computer executable code generated from compiling and/or interpreting C (or variants thereof), Java, PHP, Ruby, HTML, javascript, Python, AJAX, VBscript, and other programming or scripting languages used to compose and present application content.
  • In certain embodiments, the computing service 120 allows users to interact with the computing service 120 (including web application 122) using voice/speech (e.g., audio input). As shown in FIG. 1 , in certain embodiments, the computing device(s) 130 includes a VUI 108 and the client device 110 includes a VUI 106. The VUIs 106, 108 are generally configured to capture a user's voice/speech (e.g., audio input), such as via a microphone coupled to or integrated with the client device 110 and/or computing device(s) 130. In certain embodiments, the VUIs 106, 108 may transmit the audio input, via the network 140, to the computing service 120. In these embodiments, the computing service 120 may determine natural language content corresponding to the audio input. For example, the computing service 120 may identify text (e.g., words, query, etc.) of the audio input of the voice/speech of the user (e.g., by using natural language processing techniques to convert speech to text).
  • In other certain embodiments, the VUIs 106, 108 may determine natural language content corresponding to the audio input and transmit the natural language content to the computing service 120. For example, the VUIs 106, 108 may identify text (e.g., words, query, etc.) of the audio input of the voice/speech of the user (e.g., by using natural language processing techniques to convert speech to text).
  • In certain embodiments, the computing service 120 is configured to automatically detect, track, and process certain information content (e.g., tax expense information), based on the user's voice input. For example, the user's voice input may include a query to process a particular type of tax expense of the user. For instance, the user may request the computing service 120 to “Add a charitable donation of $100 to Charity X on Date Y to the user's tax return.” In another instance, the user may request the computing service 120 to “Process a child care payment to Daycare A on Date Z.” In yet another instance, the user may state to the computing service 120 that “My vehicle expenses for work today totaled $50.” Note, however, that these are merely reference examples of a user using voice/speech to interact with the computing service 120.
  • Upon obtaining natural language content corresponding to the user's audio input, the computing service 120 may process the natural language content to automatically detect, track, and process certain information content (e.g., tax expense information) that may be included within the natural language content. As shown, the computing service 120 includes a content tracker module 124, which is configured to implement one or more techniques described herein to detect, track, and process information content, based on the user's voice input. The content tracker module 124 can include software components, hardware components, or combinations thereof. Note, the content tracker module 124 is described in more detail below.
  • Note that FIG. 1 illustrates a reference example of a computing environment in which the techniques described herein may be implemented and that the techniques described herein may be implemented in other computing environments consistent with the functionality described herein.
  • FIG. 2 further illustrates certain components (e.g., content tracker module 124) of the computing environment depicted in FIG. 1 , in accordance with certain embodiments. As shown, the content tracker module 124 includes a detection tool 210, an analysis tool 220, a content processing tool 230, a notification tool 240, and one or more storage systems 250, each of which can include hardware components, software components, or combinations thereof.
  • The detection tool 210 is generally configured to determine whether natural language content 202 (corresponding to audio input of a user) includes a particular type of information, such as tax expense information. Here, for example, the detection tool 210 receives natural language content 202 from a computing device (e.g., client device 110 and/or computing device(s) 130,) which captures audio input from the user. The detection tool 210 evaluates (or analyzes) the natural language content 202 using a ML model 204 to determine a type of tax expense information 206 (or tax expense information type) from multiple types of tax expense information (e.g., charitable donations, child care expenses, rental home expenses, business related expenses, self-employed expenses, etc.). The ML model 204 may be trained on multiple different natural language variations for each type of tax expense information.
  • The analysis tool 220 is generally configured to determine a tax filing period 212 for the tax expense information type 206. As shown in FIG. 2 , the analysis tool 220 evaluates the natural language content 202 with a ML model 208 to determine the tax filing period 212 associated with the tax expense information type 206. The tax filing period 212 may be one of multiple tax filing periods. In certain embodiments, the ML model 208 is configured to evaluate the natural language content 202 to determine an occurrence of the type of tax expense information 206. For example, in some cases, the natural language content 202 may include an indication of when the tax expense occurred (e.g., the natural language content 202 may indicate that the tax expense occurred on particular date).
  • In certain embodiments, the ML model 208 is configured to evaluate the natural language content 202 and user metadata 222 to determine an occurrence of the type of tax expense information 206. The user metadata 222 may include a history of completed tax returns for different tax filing periods for the user. For example, the user metadata 222 may indicate whether the user has completed a tax return for the current tax filing period. In one embodiment, the ML model 208 may determine the associated tax filing period based on whether the user has completed a tax return for the current tax filing period. For example, if the natural language content 202 (corresponding to a user's audio input) is received after the user has completed a tax return for the current tax filing period, then the ML model 208 may determine that the tax expense information type 206 is associated with a next (or subsequent) tax filing period.
  • In another example, if the natural language content 202 (corresponding to a user's audio input) is received before the user has completed a tax return for the current tax filing period, then the ML model 208 may determine whether the tax expense information type 206 is associated with the current tax filing period or next tax filing period, based on information indicating when the tax expense occurred. In certain embodiments, the analysis tool 220 may interact with the notification tool 240 to prompt the user for information indicating when the tax expense information type 206 occurred. For example, the notification tool 240 may generate a message 214 including at least one of a query for the information indicating when the tax expense information type 206 occurred or a link allowing the user to upload information indicating when the tax expense information type 206 occurred. The notification tool 240 may transmit the message 214 to a computing device association with the user (e.g., client device 110).
  • The content processing tool 230 is generally configured to automatically process the tax expense information type 206 based on the associated tax filing period 212. For example, the content processing tool 230 may obtain (or determine) a tax return that is associated with the tax filing period 212 and may store the tax expense information type 206 (along with other associated information from the natural language content 202, such as the amount of the tax expense information type 206, an occurrence of the tax expense information type 206, etc.) in a storage system 250 in association with the respective tax return corresponding to the tax filing period 212.
  • The notification tool 240 is generally configured to generate and transmit an indication of whether the tax expense information type 206 has been processed (e.g., added to a user's relevant tax return). As shown, the notification tool 240 generates one or more messages 214. In certain embodiments, the notification tool 240 may generate a message 214 indicating that the tax expense information type 206 has been successfully processed. In these embodiments, the message 214 may include a link allowing the user to upload at least one content item (e.g., receipt, documentation, etc.) associated with the tax expense information type 206. The notification tool 240 may retrieve at least one content item uploaded by the user and may store the at least one content item into a storage system 250.
  • In certain embodiments, when the detection tool 210 determines that the natural language content 202 does not include tax expense information, the notification tool 240 may generate a message 214 indicating that the natural language content 202 does not include valid tax expense information.
  • In certain embodiments, the content processing tool 230 is also configured to present an indication of the tax expense information type 206 (along with the associated tax filing period 212) to the user via the user's computing device (e.g., client device 110). For example, the content processing tool 230 may retrieve the tax expense information type 206 from the storage system 250 where it is stored in association with a corresponding tax return based on the associated tax filing period 212. In addition to presenting the indication of the tax expense information type 206, the content processing tool 230 may provide one or more elements allowing the user to confirm adding the tax expense information type 206 to the user's tax return, remove the tax expense information type 206 from the user's tax return, verify the tax filing period 212, modify the tax filing period 212, etc.
  • Note that FIG. 2 illustrates a reference example of components of a content tracker module 124 and that the content tracker module 124 may have different set of components consistent with the functionality described herein. For example, while FIG. 2 describes the content tracker module 124 receiving natural language content 202 corresponding to a user's audio input, in certain embodiments, the content tracker module 124 may be configured to receive the user's audio input and generate natural language content 202 corresponding to the received audio input.
  • FIG. 3 illustrates an exemplary flowchart 300 for automatically detecting, tracking, and processing certain information content based on audio input from a user in accordance with certain aspects of the present disclosure. Flowchart 300 can be implemented by the computing service 120 of FIG. 1 . FIG. 3 illustrates the functionality and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur simultaneously, substantially concurrently, or out of the order noted in the figure. Each block of the flowchart and combinations of blocks in the flowchart can be implemented by special-purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
  • At block 302, the computing service 120 obtains natural language content (e.g., natural language content 202) corresponding to audio input from a user. The natural language content may include a request to process a certain type of information (e.g., tax expense information type 206). In one embodiment, the computing service 120 may receive the natural language content from a computing device associated with the user (e.g., client device 110, computing device(s) 130). For example, the natural language content may be received via at least one application programming interface (API) associated with the computing device. In another embodiment, the computing service 120 may generate the natural language content (at the computing service 120). For example, the computing service 120 may receive audio input from the user and use one or more natural language processing techniques to generate the natural language content.
  • At block 304, the computing service 120 determines whether the natural language content includes a first type of information (e.g., tax expense information type 206). In certain embodiments, the computing service 120 may evaluate the natural language content with a first ML model (e.g., ML model 204) to determine whether the natural language content includes the first type of information. If the natural language content does not include the first type of information (e.g., the natural language content may include a second type of information such as non tax expense information), then, at block 306, the computing service 120 generates and transmits a message indicating that the natural language content does not include the first type of information.
  • On the other hand, if the natural language content does include the first type of information, then, at block 308, the computing service 120 determines a temporal association (e.g., tax filing period 212) of the first type of information. In one embodiment, the first type of information includes a first type of tax expense information (e.g., tax expense information type 206) of multiple types of tax expense information. In this embodiment, determining the temporal association includes determining which tax filing period (e.g., tax filing period 212) of multiple tax filing periods is associated with the first type of tax expense information.
  • At block 310, the computing service 120 process the first type of information, based on the temporal association. In one embodiment, the computing service 120 processes the first type of information by associating the first type of information with at least one content item (e.g., tax return), based on the temporal association, and storing the first type of information in a storage system (e.g., storage system 250) in association with the at least one content item.
  • At block 312, the computing service 120 generates and transmits a message (e.g., message 214) indicating that the first type of information has been processed. In one embodiment, the message includes a link that allows the user to upload at least one content item (e.g., receipt, documentation of the tax expense, etc.) associated with the first type of information to a storage system. In this embodiment, the flowchart 300 may further include retrieving the at least one content item from the storage system and storing the at least one content item in association with the first type of information in a storage system (e.g., storage system 250), based on the temporal association of the first type of information. Additionally, in this embodiment, the flowchart 300 may further include processing the first type of information, based on the temporal association of the first type of information and the at least one content item. For example, processing the first type of information may include verifying the temporal association (e.g., tax filing period 212) of the first type of information based on the at least one content item.
  • At block 314, the computing service 120 presents the first type of information and the temporal association on a computing device associated with the user. For example, the first type of information along with the temporal association may be presented on a user interface (e.g., user interface 102) of the computing device (e.g., client device 110). The computing service 120 may also provide one or more elements within the user interface that allow the user to verify the temporal association of the first type of information, modify the temporal association of the first type of information, confirm adding the first type of information to the user's tax return, or remove the first type of information from the user's tax return.
  • FIG. 4 illustrates an exemplary computing system 400 that performs content tracking based on audio input from a user in accordance with aspects of the present disclosure. The computing system 400 is a representative example of computing service 120 depicted in FIG. 1 . As shown, the computing system 400 includes a central processing unit (CPU) 402, one or more I/O device interfaces 404, which may allow for the connection of various I/O devices 414 (e.g., keyboards, displays, mouse devices, pen input, etc.) to computing system 400, network interface 406 (which may include a transmitter for transmitting data and a receiver for receiving data), memory 408, storage 410, and interconnect 412. Storage 410 can be located either inside or outside the computing system 400. When the storage is located outside computing system 400, the storage can be connected via the network 140.
  • CPU 402 may retrieve and execute programming instructions stored in the memory 408. Similarly, CPU 402 may retrieve and store application data residing in the memory 408. Interconnect 412 transmits programming instructions and application data, among the CPU 402, I/O device interface 404, network interface 406, memory 408, and storage 410. CPU 402 can represent a single CPU, multiple CPUs, a single CPU having multiple processing cores, and the like. Additionally, memory 408 may represent volatile memory, such as random access memory. Furthermore, storage 410 may represent non-volatile memory, such as memory on a disk drive. Although shown as a single unit, storage 410 may be a combination of fixed or removable storage devices, such as fixed disc drives, removable memory cards or optical storage, network attached storage (NAS), or a storage area-network (SAN). As shown, memory 408 stores programming instructions for executing web application 122 and content tracker module 124, described in more detail above. Storage 410 stores natural language content 202, tax expense information type 206, and tax filing period 212, described in more detail above.
  • Example Clauses
  • Clause 1: A method comprising: receiving natural language content corresponding to audio input from a user; determining whether the natural language content comprises a first type of information, based on evaluating the natural language content with a first machine learning model; and in response to determining the natural language content comprises the first type of information: determining a temporal association of the first type of information, based on evaluating the natural language content with a second machine learning model; and transmitting to the user a message comprising an indication of the temporal association of the first type of information.
  • Clause 2: The method of Clause 1, further comprising processing the first type of information, based on the temporal association of the first type of information, wherein processing the first type of information comprises: associating the first type of information with at least one content item, based on the temporal association; and storing the first type of information in a storage system in association with the at least one content item.
  • Clause 3: The method of any one of Clauses 1 through 2, wherein the message further comprises an indication that the first type of information has been processed.
  • Clause 4: The method of any one of Clauses 1 through 3, wherein: the first type of information comprises tax expense information; and determining the temporal association of the first type of information comprises determining which tax filing period of a plurality of tax filing periods is associated with the tax expense information.
  • Clause 5: The method of any one of Clauses 1 through 4, wherein: the first type of information comprises a first type of tax expense information of a plurality of types of tax expense information; and determining the temporal association of the first type of information comprises determining which tax filing period of a plurality of tax filing periods is associated with the first type of tax expense information.
  • Clause 6: The method of Clause 5, wherein the at least one content item comprises a tax return corresponding to the tax filing period.
  • Clause 7: The method of any one of Clauses 1 through 6, wherein: the natural language content comprises a request to process the first type of information; and the method further comprises, in response to determining that the natural language content comprises a second type of information, transmitting to the user a message comprising an indication that the second type of information is not valid and that the second type of information has not been applied to at least one content item associated with the user.
  • Clause 8: The method of any one of Clauses 1 through 7, further comprising presenting the first type of information and the temporal association of the first type of information on a user interface of a computing device associated with the user.
  • Clause 9: The method of Clause 8, further comprising providing one or more elements within the user interface that allow the user to at least one of (i) verify the temporal association of the first type of information, (ii) modify the temporal association of the first type of information, or (iii) remove the first type of information from at least one content item associated with the user.
  • Clause 10: The method of any one of Clauses 1 through 9, wherein: the message further comprises a link that allows the user to upload at least one content item associated with the first type of information to a first storage system, and the method further comprises: receiving the at least one content item from the first storage system; and storing the at least one content item in association with the first type of information in a second storage system, based on the temporal association of the first type of information.
  • Clause 11: The method of Clause 10, further comprising processing the first type of information, based on the temporal association of the first type of information and the at least one content item, wherein processing the first type of information comprises verifying the temporal association of the first type of information based on the at least one content item.
  • Clause 12: The method of any one of Clauses 1 through 11, wherein the natural language content is received via an application programming interface (API) associated with a computing system.
  • Clause 13: The method of Clause 12, wherein the computing system comprises a smart home device or a mobile device.
  • Clause 14: A method for automatically tracking tax expense information based on audio input from a user, the method comprising: receiving natural language content corresponding to audio input from a user; determining whether the natural language content comprises tax expense information, based on evaluating the natural language content with a first machine learning model; in response to determining the natural language content comprises the tax expense information: determining a type of the tax expense information based on the evaluation; determining which tax filing period of a plurality of tax filing periods are associated with the type of the tax expense information, based on evaluating the natural language content with a second machine learning model; and automatically processing the type of the tax expense information on behalf of the user, comprising: associating the type of the tax expense information with at least one content item, based on the tax filing period associated with the type of the tax expense information; and storing the type of the tax expense information in a storage system in association with the at least one content item; presenting the type of the tax expense information and the associated tax filing period on a user interface of a computing device associated with the user; and providing one or more elements within the user interface that allow the user to at least one of (i) verify the tax filing period associated with the type of tax expense information, (ii) modify the tax filing period associated with the type of tax expense information, or (iii) remove the type of tax expense information from the at least one content item.
  • Clause 15: A system comprising: a memory having executable instructions stored thereon; and a processor configured to execute the executable instructions to cause the system to: receive natural language content corresponding to audio input from a user; determine whether the natural language content comprises a first type of information, based on evaluating the natural language content with a first machine learning model; and in response to determining the natural language content comprises the first type of information: determine a temporal association of the first type of information, based on evaluating the natural language content with a second machine learning model; and transmit to the user a message comprising an indication of the temporal association of the first type of information.
  • Clause 16: The system of Clause 15, wherein the processor is further configured to execute the executable instructions to cause the system to process the first type of information, based on the temporal association of the first type of information, wherein processing the first type of information comprises: associating the first type of information with at least one content item, based on the temporal association; and storing the first type of information in a storage system in association with the at least one content item.
  • Clause 17: The system of any one of Clauses 15 through 16, wherein the message further comprises an indication that the first type of information has been processed.
  • Clause 18: The system of any one of Clauses 15 through 17, wherein: the first type of information comprises tax expense information; and determining the temporal association of the first type of information comprises determining which tax filing period of a plurality of tax filing periods is associated with the tax expense information.
  • Clause 19: The system of any one of Clauses 15 through 18, wherein the at least one content item comprises a tax return corresponding to the tax filing period.
  • Clause 20: The system of any one of Clauses 15 through 19, wherein the processor is further configured to execute the executable instructions to cause the system to: present the first type of information and the temporal association of the first type of information on a user interface of a computing device associated with the user; and provide one or more elements within the user interface that allow the user to at least one of (i) verify the temporal association of the first type of information, (ii) modify the temporal association of the first type of information, or (iii) remove the first type of information from at least one content item associated with the user.
  • Clause 21: An apparatus comprising: a memory having executable instructions stored thereon; and a processor configured to execute the executable instructions to cause the apparatus to perform the methods of any one of Clauses 1 through 14.
  • Clause 22: An apparatus comprising: means for performing the methods of any one of Clauses 1 through 14.
  • Clause 23: A computer-readable medium having instructions stored thereon which, when executed by a processor, performs the methods of any one of Clauses 1 through 14.
  • Additional Considerations
  • The preceding description is provided to enable any person skilled in the art to practice the various embodiments described herein. The examples discussed herein are not limiting of the scope, applicability, or embodiments set forth in the claims. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments. For example, changes may be made in the function and arrangement of elements discussed without departing from the scope of the disclosure. Various examples may omit, substitute, or add various procedures or components as appropriate. For instance, the methods described may be performed in an order different from that described, and various steps may be added, omitted, or combined. Also, features described with respect to some examples may be combined in some other examples. For example, an apparatus may be implemented or a method may be practiced using any number of the aspects set forth herein. In addition, the scope of the disclosure is intended to cover such an apparatus or method that is practiced using other structure, functionality, or structure and functionality in addition to, or other than, the various aspects of the disclosure set forth herein. It should be understood that any aspect of the disclosure disclosed herein may be embodied by one or more elements of a claim.
  • As used herein, the word “exemplary” means “serving as an example, instance, or illustration.” Any aspect described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects.
  • As used herein, a phrase referring to “at least one of” a list of items refers to any combination of those items, including single members. As an example, “at least one of: a, b, or c” is intended to cover a, b, c, a-b, a-c, b-c, and a-b-c, as well as any combination with multiples of the same element (e.g., a-a, a-a-a, a-a-b, a-a-c, a-b-b, a-c-c, b-b, b-b-b, b-b-c, c-c, and c-c-c or any other ordering of a, b, and c).
  • As used herein, the term “determining” encompasses a wide variety of actions. For example, “determining” may include calculating, computing, processing, deriving, investigating, looking up (e.g., looking up in a table, a database or another data structure), ascertaining and the like. Also, “determining” may include receiving (e.g., receiving information), accessing (e.g., accessing data in a memory) and the like. Also, “determining” may include resolving, selecting, choosing, establishing and the like.
  • The methods disclosed herein comprise one or more steps or actions for achieving the methods. The method steps and/or actions may be interchanged with one another without departing from the scope of the claims. In other words, unless a specific order of steps or actions is specified, the order and/or use of specific steps and/or actions may be modified without departing from the scope of the claims. Further, the various operations of methods described above may be performed by any suitable means capable of performing the corresponding functions. The means may include various hardware and/or software component(s) and/or module(s), including, but not limited to a circuit, an application specific integrated circuit (ASIC), or processor. Generally, where there are operations illustrated in figures, those operations may have corresponding counterpart means-plus-function components with similar numbering.
  • The following claims are not intended to be limited to the embodiments shown herein, but are to be accorded the full scope consistent with the language of the claims. Within a claim, reference to an element in the singular is not intended to mean “one and only one” unless specifically so stated, but rather “one or more.” Unless specifically stated otherwise, the term “some” refers to one or more. No claim element is to be construed under the provisions of 35 U.S.C. § 112(f) unless the element is expressly recited using the phrase “means for” or, in the case of a method claim, the element is recited using the phrase “step for.” All structural and functional equivalents to the elements of the various aspects described throughout this disclosure that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the claims. Moreover, nothing disclosed herein is intended to be dedicated to the public regardless of whether such disclosure is explicitly recited in the claims.

Claims (20)

What is claimed is:
1. A method comprising:
monitoring a chat window of a first user interface of a first computing device associated with a first user during an interaction session between the first user of an application and a second user of the application, wherein the chat window comprises one or more elements for uploading at least one attachment;
detecting, based on the monitoring, an upload of the at least one attachment;
in response to detecting the upload of the at least one attachment, blocking access to the at least one attachment from a second computing device associated with the second user;
identifying and extracting content from the at least one attachment;
determining a type of the at least one attachment based on the content; and
determining whether the second user is authorized to access the type of the at least one attachment based on a set of permissions associated with the second user, a type of engagement between the first user and the second user during the interactive session, and the type of the at least one attachment, the type of the engagement between the first user and the second user comprising a selection from a plurality of types of engagement corresponding to a plurality of possible attachment types for the at least one attachment.
2. The method of claim 1, further comprising:
storing the at least one attachment in a storage system in response to detecting the upload of the at least one attachment,
wherein identifying and extracting the content from the at least one attachment comprises retrieving the at least one attachment from the storage system and performing optical character recognition on the at least one attachment retrieved from the storage system.
3. The method of claim 2, wherein:
the access to the at least one attachment is blocked for a first period of time,
and the method further comprises, upon determining that the second user is unauthorized to access the type of the at least one attachment:
continuing to block access to the at least one attachment from the second computing device for a second period of time after the first period of time has elapsed; and
removing the at least one attachment from the storage system.
4. The method of claim 3, wherein continuing to block access to the at least one attachment comprises permanently blocking access to the at least one attachment after the first period of time has elapsed.
5. The method of claim 2, wherein:
the access to the at least one attachment is blocked for a period of time,
and the method further comprises, upon determining that the second user is authorized to access the type of the at least one attachment:
allowing access to the at least one attachment from the second computing device after the period of time has elapsed; and
removing the at least one attachment from the storage system.
6. The method of claim 1, wherein the type of the at least one attachment is determined based on evaluating the content using at least one of a rules engine or a machine learning model.
7. The method of claim 6, wherein evaluating the content comprises classifying the at least one attachment into one of a plurality of different types of tax documents based on the content, the plurality of different types of tax documents including a plurality of business type tax documents and a plurality of personal type tax documents, where the at least one of the rules engine or machine learning model is used to classify the at least one attachment into a business type tax document or a personal type tax document, the type of engagement is selected from a business type engagement or a personal type engagement, and the determining whether the second user is authorized to access the type of the at least one attachment is based on the type of engagement matching the type of the at least one attachment.
8. The method of claim 7, wherein evaluating the content further comprises classifying the at least one attachment into one of a plurality of different types of tax documents by:
generating a confidence score for each respective type of tax document based on the content; and
selecting the type of tax document having a highest confidence score as the one of the plurality of different types of tax documents.
9. The method of claim 1, wherein the upload of the at least one attachment is to a first storage system, the method further comprising obtaining, from a second storage system the set of permissions associated with the second user, the type of engagement between the first user and the second user during the interactive session, and the type of the at least one attachment.
10. The method of claim 1, wherein the second user is a human support agent or a virtual support agent.
11. A system comprising:
a memory having executable instructions stored thereon; and
a processor configured to execute the executable instructions to cause the system to:
monitor a chat window of a first user interface of a first computing device associated with a first user during an interaction session between the first user of an application and a second user of the application, wherein the chat window comprises one or more elements for uploading at least one attachment;
detect, based on the monitoring, an upload of the at least one attachment;
in response to detecting the upload of the at least one attachment, block access to the at least one attachment from a second computing device associated with the second user;
identify and extracting content from the at least one attachment;
determine a type of the at least one attachment based on the content; and
determine whether the second user is authorized to access the type of the at least one attachment based on a set of permissions associated with the second user, a type of engagement between the first user and the second user during the interactive session, and the type of the at least one attachment, the type of the engagement between the first user and the second user comprising a selection from a plurality of types of engagement corresponding to a plurality of possible attachment types for the at least one attachment.
12. The system of claim 11, wherein the processor is further configured to execute the executable instructions to cause the system to:
store the at least one attachment in a storage system in response to detecting the upload of the at least one attachment;
retrieve the at least one attachment from the storage system; and
perform optical character recognition on the at least one attachment retrieved from the storage system to identify and extract the content from the at least one attachment.
13. The system of claim 12, wherein:
the access to the at least one attachment is blocked for a first period of time; and
the processor is further configured to execute the executable instructions to cause the system to, upon determining that the second user is unauthorized to access the type of the at least one attachment:
continue to block access to the at least one attachment from the second computing device for a second period of time after the first period of time has elapsed; and
remove the at least one attachment from the storage system.
14. The system of claim 13, wherein continuing to block access to the at least one attachment comprises permanently blocking access to the at least one attachment after the first period of time has elapsed.
15. The system of claim 12, wherein:
the access to the at least one attachment is blocked for a period of time; and
the processor is further configured to execute the executable instructions to cause the system to, upon determining that the second user is authorized to access the type of the at least one attachment:
allow access to the at least one attachment from the second computing device after the period of time has elapsed; and
remove the at least one attachment from the storage system.
16. The system of claim 11, wherein the type of the at least one attachment is determined based on evaluating the content using at least one of a rules engine or a machine learning model.
17. The system of claim 16, wherein evaluating the content comprises classifying the at least one attachment into one of a plurality of different types of tax documents based on the content, the plurality of different types of tax documents including a plurality of business type tax documents and a plurality of personal type tax documents, where the at least one of the rules engine or machine learning model is used to classify the at least one attachment into a business type tax document or a personal type tax document, the type of engagement is selected from a business type engagement or a personal type engagement, and the determining whether the second user is authorized to access the type of the at least one attachment is based on the type of engagement matching the type of the at least one attachment.
18. The system of claim 17, wherein evaluating the content further comprises classifying the at least one attachment into one of a plurality of different types of tax documents by:
generating a confidence score for each respective type of tax document based on the content; and
selecting the type of tax document having a highest confidence score as the one of the plurality of different types of tax documents.
19. The system of claim 11, wherein the upload of the at least one attachment is to a first storage system, and the processor is further configured to execute the executable instructions to cause the system to obtain, from a second storage system the set of permissions associated with the second user, the type of engagement between the first user and the second user during the interactive session, and the type of the at least one attachment.
20. A non-transitory computer-readable medium comprising instructions that, when executed by one or more processors of a computing system, cause the computing system to:
monitor a chat window of a first user interface of a first computing device associated with a first user during an interaction session between the first user of an application and a second user of the application, wherein the chat window comprises one or more elements for uploading at least one attachment;
detect, based on the monitoring, an upload of the at least one attachment;
in response to detecting the upload of the at least one attachment, block access to the at least one attachment from a second computing device associated with the second user;
identify and extracting content from the at least one attachment;
determine a type of the at least one attachment based on the content; and
determine whether the second user is authorized to access the type of the at least one attachment based on a set of permissions associated with the second user, a type of engagement between the first user and the second user during the interactive session, and the type of the at least one attachment, the type of the engagement between the first user and the second user comprising a selection from a plurality of types of engagement corresponding to a plurality of possible attachment types for the at least one attachment.
US18/453,617 2022-07-29 2023-08-22 Voice enabled content tracker Pending US20240037673A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/453,617 US20240037673A1 (en) 2022-07-29 2023-08-22 Voice enabled content tracker

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US17/876,716 US11776068B1 (en) 2022-07-29 2022-07-29 Voice enabled content tracker
US18/453,617 US20240037673A1 (en) 2022-07-29 2023-08-22 Voice enabled content tracker

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US17/876,716 Continuation US11776068B1 (en) 2022-07-29 2022-07-29 Voice enabled content tracker

Publications (1)

Publication Number Publication Date
US20240037673A1 true US20240037673A1 (en) 2024-02-01

Family

ID=87474297

Family Applications (2)

Application Number Title Priority Date Filing Date
US17/876,716 Active US11776068B1 (en) 2022-07-29 2022-07-29 Voice enabled content tracker
US18/453,617 Pending US20240037673A1 (en) 2022-07-29 2023-08-22 Voice enabled content tracker

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US17/876,716 Active US11776068B1 (en) 2022-07-29 2022-07-29 Voice enabled content tracker

Country Status (4)

Country Link
US (2) US11776068B1 (en)
EP (1) EP4312146A1 (en)
AU (1) AU2023202068A1 (en)
CA (1) CA3194381A1 (en)

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5822741A (en) * 1996-02-05 1998-10-13 Lockheed Martin Corporation Neural network/conceptual clustering fraud detection architecture
US20020013747A1 (en) * 2000-02-23 2002-01-31 Valentine Michael J. Method and apparatus for electronic filing of income tax returns by a taxpayer
US20070005509A1 (en) * 2004-01-14 2007-01-04 Ebk, Inc. Tax tracker transaction card
WO2008005102A2 (en) * 2006-05-13 2008-01-10 Sap Ag Consistent set of interfaces derived from a business object model
US8484140B2 (en) * 2009-06-09 2013-07-09 Microsoft Corporation Feature vector clustering
CN102053992B (en) * 2009-11-10 2014-12-10 阿里巴巴集团控股有限公司 Clustering method and system
US8706580B2 (en) * 2010-04-26 2014-04-22 Hrb Tax Group, Inc. Method, system, and computer program for predicting tax liabilities and benefits
US8442827B2 (en) * 2010-06-18 2013-05-14 At&T Intellectual Property I, L.P. System and method for customized voice response
US9946699B1 (en) * 2012-08-29 2018-04-17 Intuit Inc. Location-based speech recognition for preparation of electronic tax return
US10276157B2 (en) * 2012-10-01 2019-04-30 Nuance Communications, Inc. Systems and methods for providing a voice agent user interface
US9406089B2 (en) * 2013-04-30 2016-08-02 Intuit Inc. Video-voice preparation of electronic tax return
US10726831B2 (en) * 2014-05-20 2020-07-28 Amazon Technologies, Inc. Context interpretation in natural language processing using previous dialog acts
US10373268B1 (en) * 2014-10-05 2019-08-06 Henry A. Orphys Determining correct answers to tax and accounting issues arising from business transactions and generating accounting entries to record those transactions using a computerized predicate logic implementation
US10341728B2 (en) * 2015-12-30 2019-07-02 Sling Media L.L.C. Media systems for temporally and contextually relevant recommendations
US10509856B2 (en) * 2016-08-04 2019-12-17 Hrb Innovations, Inc. Simplifying complex input strings
US20180157800A1 (en) * 2016-12-02 2018-06-07 General Electric Company Methods and systems for user defined distributed learning models for medical imaging
US10592995B1 (en) * 2017-01-27 2020-03-17 Intuit Inc. Methods, systems, and computer program product for providing expense information for an electronic tax return preparation and filing software delivery model
US11816742B2 (en) * 2017-04-28 2023-11-14 Hrb Innovations, Inc. Intelligent data gathering
US20220121881A1 (en) * 2020-10-19 2022-04-21 Fulcrum Global Technologies Inc. Systems and methods for enabling relevant data to be extracted from a plurality of documents

Also Published As

Publication number Publication date
CA3194381A1 (en) 2024-01-29
AU2023202068A1 (en) 2024-02-15
US11776068B1 (en) 2023-10-03
EP4312146A1 (en) 2024-01-31

Similar Documents

Publication Publication Date Title
US11249774B2 (en) Realtime bandwidth-based communication for assistant systems
US11443305B2 (en) Context augmentation for processing data from multiple sources
US9300672B2 (en) Managing user access to query results
US11593608B2 (en) Systems and methods for predicting and providing automated online chat assistance
US10728294B2 (en) Systems and methods for providing dynamic and interactive content in a chat session
US8521655B2 (en) Engine, system and method for providing cloud-based business intelligence
US10748157B1 (en) Method and system for determining levels of search sophistication for users of a customer self-help system to personalize a content search user experience provided to the users and to increase a likelihood of user satisfaction with the search experience
AU2016346497A1 (en) Method and system for performing a probabilistic topic analysis of search queries for a customer support system
US11423314B2 (en) Method and system for facilitating user support using multimodal information
US11809419B2 (en) System to convert natural-language financial questions into database queries
US11115410B1 (en) Secure authentication for assistant systems
US10922633B2 (en) Utilizing econometric and machine learning models to maximize total returns for an entity
US20240037673A1 (en) Voice enabled content tracker
US11907308B2 (en) System and method for controlling access to secure data records in a web browsing session
US10270869B2 (en) Contextual web session and IVR integration
EP4312416A1 (en) Chat attachment screening
US11902223B2 (en) Intelligent assistant content generation
US11715108B2 (en) Methods and systems for enhancing purchase experience via audio web-recording
US20220012724A1 (en) Currency conversion in media content
CA3106963A1 (en) System and method for controlling access to secure data records in a web browsing session
CN117271787A (en) Knowledge base construction, intelligent customer service realization and knowledge question-answering method, equipment and medium
CN114022890A (en) Response method and device for image

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTUIT, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SANTHARAM, SANGEETHA UTHAMALINGAM;KIMBALL, BRIDGET DIANE;REEL/FRAME:064665/0682

Effective date: 20220728

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION