CN106383866B

CN106383866B - Location-based conversational understanding

Info

Publication number: CN106383866B
Application number: CN201610801496.1A
Authority: CN
Inventors: L·P·赫克; M·金达昆塔; D·米特比; L·施蒂费尔曼
Original assignee: Microsoft Technology Licensing LLC
Current assignee: Microsoft Technology Licensing LLC
Priority date: 2011-03-31
Filing date: 2012-03-29
Publication date: 2020-05-05
Anticipated expiration: 2032-03-29
Also published as: CN102750271A; KR101922744B1; CN102750311B; EP2691949A2; KR20140014200A; EP2691875A2; JP6105552B2; EP2691870A2; WO2012135157A3; CN102737104B; CN102737104A; CN102737101B; CN102737096B; KR101963915B1; CN102737099B; WO2012135210A3; JP2017123187A; EP2691875A4; WO2012135791A3; CN102737101A

Abstract

Location based conversational understanding may be provided. When a query is received from a user, an environmental context associated with the query may be generated. The query may be interpreted according to the environmental context. The interpreted query may be executed and at least one result associated with the query is provided to the user.

Description

Location-based conversational understanding

The application is a divisional application of the same-name Chinese invention patent application with the application date of 2012, 3, 29 and the application number of 201210087420.9.

Technical Field

The present application relates to environmental context, and in particular location-based conversational understanding.

Background

Location-based conversational understanding may provide a mechanism to leverage environmental context to improve query execution and results. Conventional speech recognition programs do not have techniques to improve the quality and accuracy of new queries from new and/or existing users using information from one user to another (e.g., speech utterances, geographic data, acoustic environment of certain locations, typical queries made from a particular location). In some cases, the speech to text conversion must be made without such benefits as employing similar, potentially relevant queries to aid understanding.

Speech-to-text conversion (i.e., speech recognition) may include converting spoken phrases into text phrases that may be processed by a computing system. Acoustic modeling and/or language modeling may be used in modern statistical data-based speech recognition algorithms. Hidden Markov Models (HMMs) are widely used in many conventional systems. The HMM may include a statistical data model that may output a sequence of symbols or quantities. HMMs can be used for speech recognition because a speech signal can be considered a piecewise stationary signal or a short-time stationary signal. In a short time (e.g., 10 milliseconds), speech can be approximated as a stationary process. Speech can therefore be considered a markov model for many stochastic purposes.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter. This summary is also not intended to be used to limit the scope of the claimed subject matter.

Both the foregoing general description and the following detailed description provide examples, and are explanatory only. Accordingly, the foregoing general description and the following detailed description should not be considered to be restrictive. Further, other features or variations may be provided in addition to those set forth herein. For example, embodiments may be directed to various feature combinations and sub-combinations described in the detailed description.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate embodiments of the invention. In the drawings:

FIG. 1 is a block diagram of an operating environment;

FIG. 2 is a flow diagram of a method for providing location-based conversational understanding; and

FIG. 3 is a block diagram of a system including a computing device.

Detailed Description

The following detailed description refers to the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings and the following description to refer to the same or like elements. While embodiments of the invention may be described, modifications, adaptations, and other implementations are possible. For example, substitutions, additions, or modifications may be made to the elements illustrated in the drawings, and the methods described herein may be modified by substituting, reordering, or adding stages to the disclosed methods. The following detailed description, therefore, is not to be taken in a limiting sense. Rather, the proper scope of the invention is defined by the appended claims.

Location based conversational understanding may be provided. For example, a speech-to-text system may be provided that correlates information from multiple users in order to improve the accuracy of the conversation and the results of the query included in the converted sentence. According to embodiments of the present invention, a personal assistant program may receive speech-based queries from users at multiple locations. Each query may be analyzed for acoustic and/or environmental characteristics, and such characteristics may be stored and associated with the location from which the query was received. For example, a query received from a user at a subway station may detect acoustic echoes off the tile wall and/or the presence of background environmental sounds of people or subway trains. It is then known that these characteristics can be used in the future to filter out queries from that location to allow for more accurate translation of those queries. According to embodiments of the present invention, a location may be defined, for example, by a location of a user's Global Positioning System (GPS), an area code associated with the user, a zip code associated with the user, and/or a proximity of the user at a landmark (e.g., a train station, a stadium, a museum, an office building, etc.).

Processing the query may include rewriting the query according to an acoustic model. For example, the acoustic model may include background sounds that are known to be present at a particular location. Applying an acoustic model may allow for more accurate translation of queries by ignoring irrelevant sounds. The acoustic model also allows changes to be made to the display of any results associated with the query. For example, in certain noisy environments, the results may be displayed on a screen rather than through audio. An environmental context may also be associated with the understanding model to facilitate speech to text conversion. For example, the understanding model may include a Hidden Markov Model (HMM). The environmental context may also be associated with a semantic model to assist in executing the query. For example, the semantic model may include an ontology (ontology). Ontology was applied in related applications S/N __/___, ___, _____, 2011, and entitled "personalization of queries, sessions, and searches," incorporated herein by reference in its entirety.

Moreover, the subject matter of the query may be used to improve the results of future queries. For example, if a user at a subway station queries "when is next shift? ", the personal assistant program may determine through the process of several queries that the user would like to know when the next train will arrive. This may be done by requiring that the query from the first user be classified and storing the classification for future use. In another example, if a user queries "when is next shift? "and another user inquires about" when there is a next train? ", the program may correlate the queries and make the assumption that both users are requesting the same information.

FIG. 1 is a block diagram of an operating environment 100 for providing location-based conversational understanding. Operating environment 100 may include a Spoken Dialog System (SDS)110 that includes a personal assistant program 112, a speech to text converter 114, and a context database 116. Personal assistant program 112 may receive queries from a first plurality of users 130(a) - (C) located at a first location 140 and a second plurality of users 150(a) - (C) located at a second location 160 over network 120. Context database 116 may be operative to store context data associated with queries received from users, such as first plurality of users 130(A) - (C) and/or second plurality of users 150(A) - (C). The context data may include acoustic and/or environmental characteristics as well as query context information, such as query subject matter, time/date of the query, user details, and/or location from which the query was made. According to embodiments of the invention, network 120 may include, for example, a private data network (e.g., Ethernet), a cellular data network, and/or a public network such as the Internet.

The agent may be associated with a Spoken Dialog System (SDS). Such systems allow people to interact with computers through their voices. The master component that drives the SDS may include a dialog manager: the component manages dialog-based sessions with the user. The dialog manager may determine the user's intent through a combination of multiple input sources, such as speech recognition and natural language understanding component output, context from previous dialog turns, user context, and/or results returned from a knowledge base (e.g., a search engine). Upon determining the intent, the dialog manager can take action, such as displaying the final result to the user and/or continuing the dialog with the user to satisfy their intent. The spoken dialog system may include a plurality of conversational understanding models, such as an acoustic model associated with a location and/or a speech language understanding model for processing speech-based input.

Figure 2 is a flow chart illustrating the general stages involved in a method 200 consistent with an embodiment of the invention for providing location-based conversational understanding. The method 200 may be implemented using a computing device 300, which will be described in more detail below with reference to FIG. 3. The manner in which the stages of method 200 are implemented will be described in greater detail below. Method 200 may begin at starting block 205 and continue to stage 210 where computing device 300 may receive a speech-based query from a user at a location. For example, user 130(A) may send a query to SDS 110 via a device such as a cellular phone.

From stage 210, method 200 may advance to stage 215 where computing device 300 may determine whether an environmental context associated with the location exists in a memory store. For example, the SDS 110 may identify a location (e.g., the first location 140) from which the query was received and determine whether an environmental context associated with the location exists in the context database 116.

If there is no context associated with the location, method 200 proceeds to stage 220 where computing device 300 may identify at least one acoustic disturbance in the voice-based query. For example, SDS 110 may analyze the audio of the query and identify background noise such as that associated with a large number of people around user 130(a) and/or passing trains.

Method 200 may then proceed to stage 225 where computing device 300 may identify at least one topic associated with the voice-based query. For example, if the query includes "when the next shift arrives? "then the SDS 110 may identify a train schedule as the subject of the query when the user is at a train station.

Method 200 may then advance to stage 230 where computing device 300 may create a new environmental context associated with the location for storage in a memory store. For example, the SDS 110 may store the identified acoustic interference and query topic in a form associated with the user location in the context database 116.

If a context associated with the location exists, method 200 may advance to stage 235 where computing device 300 may load an environmental context associated with the location. For example, the SDS 110 may load the environmental context from the context database 116 as described above.

After creating the context at stage 240 or loading the context at stage 235, method 200 may then advance to stage 240 where computing device 300 may convert the speech-based query to a text-based query according to the environmental context. For example, the SDS 110 may convert the voice-based query to a text-based query by applying a filter to remove at least one acoustic disturbance associated with the environmental context.

Method 200 may then advance to stage 245 where computing device 300 may execute the text-based query according to the environmental context. For example, the SDS 110 may execute a query associated with at least one topic (e.g., "what time to next shift arrival.

Method 200 may then advance to stage 250 where computing device 300 may provide at least one result of the executed text-based query to the user. For example, SDS 110 may transmit the results to a device (e.g., a cellular phone) associated with user 130(a) for display. The method 200 may then end at stage 255.

Embodiments consistent with the invention may comprise a system for providing location-based conversational understanding. The system may include a memory storage, and a processing unit coupled to the memory storage. The processing unit is operatively operative to receive a query from a user, generate an environmental context associated with the query, interpret the query in accordance with the environmental context, execute the interpreted query, and provide at least one result of the query to the user. The query may comprise, for example, a voice query that the processing unit is operable to convert into computer-readable text. According to embodiments of the invention, the conversion of speech to text may utilize a hidden Markov model algorithm that includes statistical weights for the various most likely words associated with the understanding model and/or semantic concepts associated with the semantic model. The processing unit is operatively operative to increase a statistical weight of at least one expected term, e.g., based on at least one previous query received from the location, and store the statistical weight as part of the environmental context.

The environmental context may include an acoustic model associated with a location from which the query was received. The processing unit is operable to rewrite the query based on at least one background sound derived from the speech-based query based on the acoustic model. For example, it may be known that background sounds (e.g., train sirens) are present in a voice query received from a given location (e.g., a train station). Background sounds may be detected and measured for pitch, amplitude, and other acoustic characteristics. The query may be rewritten to ignore such sounds, and the sounds may be computed and stored for application to future queries from that location. The processing unit may also be operable to receive a second speech-based query from a second user and rewrite the query to obtain the same background sound according to the updated acoustic model. The processing unit may also be operable to aggregate environmental contexts associated with a plurality of queries from a plurality of users and store the aggregated environmental context associated with the location.

Embodiments consistent with the invention may comprise a system for providing location-based conversational understanding. The system may include a memory storage, and a processing unit coupled to the memory storage. The processing unit is operatively enabled to receive a voice-based query from a user at a location, load an environmental context associated with the location, convert the voice-based query to text according to the environmental context, execute the converted query according to the environmental context, and provide at least one result associated with the executed query to the user. The environmental context may include, for example, a time of the at least one previous query, a date of the at least one previous query, a topic of the at least one previous query, a semantic model including an ontology, an understanding model, and an acoustic model of the location. The processing unit is operable to rewrite the query based on a known acoustic disturbance associated with the location. The processing unit may also be operative to store a plurality of environmental contexts associated with a plurality of locations aggregated from a plurality of queries received from a plurality of users. The processing unit may also be operative to receive a correction to the converted text from the user and update the environmental context in accordance with the correction. The processing unit may be further operative to receive a second voice-based query from the user at a second location, load a second environmental context associated with the second location, convert the second voice-based query to text in accordance with the second environmental context, execute the converted query in accordance with the second environmental context and provide at least one second result associated with the executed query to the user.

Yet another embodiment consistent with the invention may comprise a system for providing a context-aware environment. The system may include a memory storage, and a processing unit coupled to the memory storage. The processing unit is operable to receive a voice-based query from a user at a location and determine whether an environmental context associated with the location exists in the memory store. In response to determining that no environmental context exists, the processing unit may be operative to identify at least one acoustic disturbance in the voice-based query, identify at least one topic associated with the voice-based query, and create a new environmental context associated with the location for storage in the memory storage. In response to determining that an environmental context exists, the processor unit may be operative to load the environmental context. The processing unit may then be operable to convert the voice-based query to a text-based query according to an environmental context, wherein the operable to convert the voice-based query to the text-based query according to the environmental context includes operable to apply a filter to remove at least one acoustic disturbance associated with the environmental context, execute the text-based query according to the environmental context, wherein the operable to execute the text-based query according to the environmental context includes operable to execute the query, wherein the at least one acoustic disturbance is associated with an acoustic model, and wherein the at least one identified topic is associated with a semantic model, the semantic model being associated with the environmental context, and provide at least one result of the executed text-based query to a user.

Fig. 3 is a block diagram of a system including a computing device 300. In accordance with an embodiment of the present invention, the above-described memory storage and processing unit may be implemented in a computing device, such as computing device 300 of FIG. 3. The memory storage and processing unit may be implemented using any suitable combination of hardware, software, or firmware. For example, the memory storage and processing unit may be implemented with computing device 300 or any of the other computing devices 318, in combination with computing device 300. The above-described systems, devices, and processors are examples, and other systems, devices, and processors may include the above-described memory storage and processing unit, according to embodiments of the invention. Further, computing device 300 may include an operating environment for system 100 as described above. The system 100 may operate in other environments and is not limited to the computing device 300.

Referring to FIG. 3, a system according to an embodiment of the invention may include a computing device, such as computing device 300. In a basic configuration, computing device 300 may include at least one processing unit 302 and system memory 304. Depending on the configuration and type of computing device, system memory 304 may include, but is not limited to, volatile (e.g., Random Access Memory (RAM)), non-volatile (e.g., read-only memory (ROM)), flash memory, or any combination. System memory 304 may include an operating system 305, one or more programming modules 306, and may include a personal assistant program 112. For example, operating system 305 may be suitable for controlling the operation of computing device 300. Furthermore, embodiments of the invention may be practiced in conjunction with a graphics library, other operating systems, or any other application program and is not limited to any particular application or system. This basic configuration is illustrated in fig. 3 by those components within dashed line 308.

Computing device 300 may have additional features or functionality. For example, computing device 300 may also include additional data storage devices (removable and/or non-removable) such as, for example, magnetic disks, optical disks, or tape. Such additional storage is illustrated in FIG. 3 by removable storage 309 and non-removable storage 310. Computer storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. System memory 304, removable storage 309 and non-removable storage 310 are all examples of computer storage media (i.e., memory storage). Computer storage media may include, but is not limited to, RAM, ROM, electrically erasable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store information and which can be accessed by computing device 300. Any such computer storage media may be part of device 300. Computing device 300 may also have input device(s) 312 such as keyboard, mouse, pen, voice input device, touch input device, etc. Output device(s) 314 such as a display, speakers, printer, etc. may also be included. The above devices are examples, and other devices may be used.

Computing device 300 may also contain communication connections 316 that may allow device 300 to communicate with other computing devices 318, such as over a network in a distributed computing environment (e.g., an intranet or the internet). Communication connection(s) 316 is one example of communication media. Communication media may typically be embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and includes any information delivery media. The term "modulated data signal" may describe a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, Radio Frequency (RF), infrared and other wireless media. The term "computer-readable media" as used herein may include both storage media and communication media.

As mentioned above, a number of program modules and data files may be stored in system memory 304, including operating system 305. When executed on processing unit 302, programming modules 306 (e.g., personal assistant program 112) may perform processes including, for example, one or more of the stages of method 200 as described above. The above process is one example, and processing unit 302 may perform other processes. Other programming modules that may be used in accordance with embodiments of the present invention may include email and contacts applications, word processing applications, spreadsheet applications, database applications, slide presentation applications, drawing or computer-aided application programs, and the like.

Generally, program modules may include routines, programs, components, data structures, and other types of structures that may perform particular tasks or that may implement particular abstract data types in accordance with embodiments of the invention. Moreover, embodiments of the invention may be practiced with other computer system configurations, including hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, and the like. Embodiments of the invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory storage devices.

Furthermore, embodiments of the invention may be practiced in a circuit comprising discrete electronic elements, packaged or integrated electronic chips containing logic gates, a circuit utilizing a microprocessor, or on a single chip containing electronic elements or microprocessors. Embodiments of the invention may also be practiced using other technologies capable of performing logical operations such as, for example, AND, OR, AND NOT, including but NOT limited to mechanical, optical, fluidic, AND quantum technologies. In addition, embodiments of the invention may be practiced in a general purpose computer or any other circuits or systems.

For example, embodiments of the invention may be implemented as a computer process (method), a computing system, or as an article of manufacture such as a computer program product or computer readable media. The computer program product may be a computer storage media readable by a computer system and encoding a computer program of instructions for executing a computer process. The computer program product may also be a propagated signal on a carrier readable by a computing system and encoding a computer program of instructions for executing a computer process. Accordingly, the present invention may be embodied in hardware and/or in software (including firmware, resident software, micro-code, etc.). In other words, embodiments of the invention may take the form of a computer program product on a computer-usable or computer-readable storage medium having computer-usable or computer-readable program code embodied in the medium for use by or in connection with an instruction execution system. A computer-usable or computer-readable medium may be any medium that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

The computer-usable or computer-readable medium may be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, and a portable compact disc read-only memory (CD-ROM). Note that the computer-usable or computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted, of otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.

Embodiments of the present invention are described above with reference to block diagrams and/or operational illustrations of methods, systems, and computer program products according to embodiments of the invention, for example. The functions/acts noted in the blocks may occur out of the order noted in any flowchart. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved.

While specific embodiments of the invention have been described, other embodiments are possible. In addition, although embodiments of the present invention have been described as being associated with data stored in memory and other storage media, data may also be stored on or read from other types of computer-readable media, such as secondary storage devices (like hard disks, floppy disks, or a CD-ROM), a carrier wave from the Internet, or other forms of RAM or ROM. In addition, steps of the disclosed methods may be modified in any manner, including by reordering steps and/or inserting or deleting steps, without departing from the invention.

All rights including copyrights in the code included herein are vested in and the property of the applicant. The applicant retains and reserves all rights in the code included herein and grants permission to reproduce the material only in connection with reproduction of the granted patent and for no other purpose.

Although the description includes examples, the scope of the invention is indicated by the appended claims. Furthermore, although the specification has been described in language specific to structural features and/or methodological acts, the claims are not limited to the features or acts described above. Rather, the specific features and acts described above are disclosed as examples of embodiments of the invention.

Claims

1. A system for providing location-based conversational understanding, comprising:

at least one processor; and

a memory coupled to the at least one processor, the memory comprising computer-executable instructions that, when executed by the at least one processor, perform a method for providing location-based conversational understanding, the method comprising:

receiving, by a computing device, a voice-based query from a user located at a location;

determining whether an environmental context is associated with the location;

when it is determined that no environmental context is associated with the location:

identifying at least one acoustic disturbance in the speech-based query;

identifying a subject matter of the voice-based query;

creating an environmental context based at least on the identified topic of the voice-based query and the acoustic interference in the voice-based query;

converting the speech-based query into a text-based query using the environmental context to suppress acoustic interference.

2. The system of claim 1, wherein the environmental context is associated with at least one of:

an understanding model that facilitates speech to text conversion; and

semantic models that facilitate query execution.

3. The system of claim 1, wherein the location is determined using at least one of:

global Positioning System (GPS) coordinates;

a region code associated with the user;

a zip code associated with the user; and

proximity to landmarks.

4. The system of claim 1, wherein identifying at least one acoustic disturbance comprises: analyzing audio of the query and identifying background noise in the audio.

5. The system of claim 1, wherein identifying the topic of the speech-based query comprises:

requesting clarification of the voice-based query from the user; and

correlating a plurality of queries, wherein the plurality of queries are identified as requesting similar information.

6. The system of claim 1, wherein creating the environmental context comprises:

associating the identified acoustic interference and the identified subject with the location; and

storing information of the identified acoustic disturbance, the identified subject, and the location in a context database.

7. The system of claim 1, wherein converting the speech-based query comprises: applying a filter for removing acoustic interference associated with the environmental context.

8. The system of claim 1, wherein converting the speech-based query comprises: using a hidden Markov model algorithm that includes statistical weights for at least one of the terms and the semantic concepts.

9. The system of claim 1, the method further comprising:

executing the text-based query according to the environmental context within a search domain associated with the identified topic; and

providing results of the executed text-based query to the user.

10. A method for providing location-based conversational understanding, the method comprising:

receiving, by a computing device, a first speech-based query from a user located at a location;

determining whether an environmental context is associated with the location;

identifying at least a first acoustic disturbance in the first speech-based query;

identifying a subject matter of the first voice-based query;

creating a first environmental context based at least on the identified topic of the first voice-based query and a first acoustic disturbance in the first voice-based query; and

converting the first speech-based query into a text-based query using the first environmental context.

11. The method of claim 10, wherein converting the first speech-based query comprises using a hidden markov model algorithm comprising statistical weights for at least one of:

words that may be associated with understanding the model; and

semantic concepts associated with the semantic model.

12. The method of claim 11, further comprising: increasing a statistical weight of one or more predicted terms as a function of one or more previous queries received at the location.

13. The method of claim 10, wherein the first environmental context includes an acoustic model associated with the location, and wherein the first speech-based query is adjusted according to the first acoustic disturbance using the acoustic model.

14. The method of claim 13, wherein adjusting the first speech-based query comprises:

identifying at least one background sound from the first acoustic disturbance;

adjusting the first speech-based query to ignore the at least one background sound; and

storing the at least one background sound.

15. The method of claim 14, further comprising:

receiving a second voice-based query associated with the location;

applying the acoustic model associated with the location to the second speech-based query; and

adjusting the second speech-based query to ignore the stored at least one background sound.

16. The method of claim 15, further comprising:

identifying a second acoustic disturbance in the second speech-based query;

updating the acoustic model associated with the location based on the second acoustic interference; and

adjusting the second speech-based query to ignore one or more background sounds in the second acoustic disturbance.

17. The method of claim 10, further comprising:

receiving a second voice-based query associated with the location;

creating a second environmental context based on the second voice-based query;

aggregating the first environmental context and the second environmental context into an aggregated environmental context, wherein the aggregated environmental context is associated with the location; and

storing the aggregated environmental context.

18. The method of claim 17, wherein the aggregated environmental context comprises a subject matter of the first voice-based query and a subject matter of the second voice-based query.

19. The method of claim 18, wherein a topic of the first speech-based query is used to improve a result of the second speech-based query.

20. A computer-readable storage device storing computer-executable instructions that, when executed, cause a computing system to perform a method for providing location-based conversational understanding, the method comprising:

determining whether an environmental context is associated with the location;

identifying at least one acoustic disturbance in the speech-based query;

identifying a subject matter of the voice-based query;

creating an environmental context based at least on the identified topic of the voice-based query and the acoustic interference in the voice-based query; and

converting the speech-based query into a text-based query using the environmental context.