CN103069480B - Speech and noise models for speech recognition - Google Patents

Speech and noise models for speech recognition Download PDF

Info

Publication number
CN103069480B
CN103069480B CN201180026390.4A CN201180026390A CN103069480B CN 103069480 B CN103069480 B CN 103069480B CN 201180026390 A CN201180026390 A CN 201180026390A CN 103069480 B CN103069480 B CN 103069480B
Authority
CN
China
Prior art keywords
user
audio
sound signal
model
noise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201180026390.4A
Other languages
Chinese (zh)
Other versions
CN103069480A (en
Inventor
M·I·洛伊德
T·克里斯特詹森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Google LLC
Original Assignee
Google LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Google LLC filed Critical Google LLC
Publication of CN103069480A publication Critical patent/CN103069480A/en
Application granted granted Critical
Publication of CN103069480B publication Critical patent/CN103069480B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering

Abstract

An audio signal generated by a device based on audio input from a user may be received. The audio signal may include at least a user audio portion that corresponds to one or more user utterances recorded by the device. A user speech model associated with the user may be accessed and a determination may be made background audio in the audio signal is below a defined threshold. In response to determining that the background audio in the audio signal is below the defined threshold, the accessed user speech model may be adapted based on the audio signal to generate an adapted user speech model that models speech characteristics of the user. Noise compensation may be performed on the received audio signal using the adapted user speech model to generate a filtered audio signal with reduced background audio compared to the received audio signal.

Description

For speech model and the noise model of speech recognition
the cross reference of related application
This application claims that submit on June 14th, 2010, that title is the Application U.S. Serial No 12/814,665 of " SPEECH ANDNOISE MODELS FOR SPEECH RECOGNITION " right of priority, its disclosure is incorporated into this by reference.
Technical field
This instructions relates to speech recognition.
Background technology
Speech recognition may be used for voice search query.Usually, search inquiry comprises one or more query term that user submits to search engine when user asks search engine execution search.In other modes, user can by key on keyboard or when voice queries by the microphone to such as mobile device in spoken query item carry out the query term of typing search inquiry.
When being submitted to voice queries by such as mobile device, the microphone of mobile device also may record neighbourhood noise or sound except the spoken utterance of user, is referred to as " environment audio frequency " or " background audio " in other respects.Such as, environment audio frequency can comprise be positioned at around user other people background chat or talk or by nature (such as, bark) or the noise that generates of culture (such as, office, airport or highway noise or construction activity).Environment audio frequency partly may cover the speech of user, thus makes automated voice identification (" ASR ") engine be difficult to accurately identify spoken utterance.
Summary of the invention
In one aspect, a kind of system comprises one or more treatment facility and stores one or more memory devices of instruction, when instruction is performed by one or more treatment facility, make one or more treatment facility receive the sound signal generated based on the audio frequency input from user by equipment, sound signal at least comprises the audio user part corresponded to by one or more user spoken utterances of equipment record; Access the user speech model be associated with user; Determine that the background audio in sound signal is defining below threshold value; In response to the background audio determined in sound signal below the threshold value of definition, based on the user speech model of audio signal adaptation access to generate adapt user speech model to the modeling of user speech characteristic; And use adapt user speech model to perform noise compensation to generate the filtering audio signals compared with the sound signal received with the background audio of minimizing to the sound signal received.
Implementation can comprise one or more following characteristics.Such as, sound signal can comprise the environment audio-frequency unit that only corresponds to around the background audio of user to determine that background audio in sound signal is under definition threshold value, instruction can comprise as given an order, upon being performed, the amount of the energy in one or more treatment facility determination environment audio-frequency unit is made; And determine that the amount of energy in environment audio-frequency unit is under threshold energy.In order to determine that the background audio in sound signal is defining under threshold value, instruction comprises as given an order, and upon being performed, makes the signal to noise ratio (S/N ratio) of described one or more treatment facility determination sound signal; And determine that this signal to noise ratio (S/N ratio) is under threshold signal-to-noise ratio.Sound signal can comprise and only corresponding to around the environment audio-frequency unit of the background audio of user to determine the signal to noise ratio (S/N ratio) of sound signal, instruction comprises as given an order, upon being performed, the amount of the energy in the audio user part of one or more treatment facility determination sound signal is made; Determine the amount of the energy in the environment audio-frequency unit of sound signal; And determine signal to noise ratio (S/N ratio) by the ratio between the amount of determining the energy in audio user part and environment audio-frequency unit.
The user speech model of access can comprise the alternative user speech model of the characteristics of speech sounds modeling be not yet adapted for user.Instruction can comprise as given an order, and when being performed by one or more treatment facility, makes one or more treatment facility select to substitute user speech model; And alternative speech model is associated with user.In order to select alternative user speech model, instruction can comprise as given an order, and when being performed by one or more treatment facility, makes one or more treatment facility determine the sex of user; And select to substitute user speech model among multiple alternative user speech model based on the sex of user.In order to select alternative user speech model, instruction can comprise as given an order, and when being performed by one or more treatment facility, makes one or more treatment facility determine the position of the user when recording one or more language; And select to substitute user speech model among multiple alternative user speech model based on the position of the user when recording one or more language.In order to select alternative user speech model, instruction can comprise as given an order, and when being performed by one or more treatment facility, makes one or more treatment facility determine language or the accent of user; And select to substitute user speech model among multiple alternative user speech model based on language or accent.In order to select alternative user speech model, instruction can comprise as given an order, when being performed by one or more treatment facility, one or more treatment facility is made to receive the initial sound signal at least comprising and corresponding to by the initial user audio-frequency unit of one or more user spoken utterances of equipment record; Similarity measurement between the desired user speech model of the user determining multiple alternative user speech model and determine based on described initial sound signal; And select to substitute user speech model among multiple alternative user speech model based on similarity measurement.
Instruction can comprise as given an order, and upon being performed, makes one or more treatment facility access the noise model be associated with user; And wherein in order to perform noise compensation, instruction may further include as given an order, it makes one or more treatment facility use adapt user speech model and access noise model to perform noise compensation to the sound signal received.In order to perform noise compensation, instruction may further include as given an order, and it makes one or more treatment facility based on the audio signal adaptation access noise model received to generate the adaptive noise model to the characteristic modeling of the background audio around user; And use adapt user speech model and adaptive noise model to come to perform noise compensation to the sound signal received.Instruction can comprise as given an order, and upon being performed, makes one or more treatment facility receive the second sound signal at least comprising and corresponding to by the second audio user part of one or more user spoken utterances of equipment record; Determine that the background audio in the second sound signal is defining on threshold value; And in response to the background audio determined in the second sound signal on definition threshold value, the noise model be associated with user based on the second audio signal adaptation is to generate the adaptive noise model of the characteristic modeling to the background audio around user.Access noise model can comprise the alternative noise model of the characteristic modeling be not yet adapted for the background audio around user.
Instruction can comprise as given an order, and when being performed by one or more treatment facility, makes one or more treatment facility select to substitute noise model; And alternative noise model is associated with user.In order to select alternative noise model, instruction can comprise as given an order, when being performed by one or more treatment facility, one or more treatment facility is made to receive the initial sound signal at least comprising and corresponding to by the initial user audio-frequency unit of one or more user spoken utterances of equipment record; Determine the position of the user when recording the one or more language corresponding to initial user audio-frequency unit; And select to substitute noise model among multiple alternative noise model based on the position of the user when recording the one or more language corresponding to initial user audio-frequency unit.
In order to select alternative noise model, instruction can comprise as given an order, when being performed by one or more treatment facility, one or more treatment facility is made to receive the initial sound signal at least comprising and corresponding to by the initial user audio-frequency unit of one or more user spoken utterances of equipment record; Similarity measurement between the expectation noise model of the user determining multiple alternative noise model and determine based on initial sound signal; And select to substitute noise model among multiple alternative noise model based on similarity measurement.Each in multiple alternative noise model can to the characteristic modeling of the background audio in ad-hoc location.Each in multiple alternative noise model can to the characteristic modeling of the background audio in the environmental baseline of particular types.
In order to access noise model, instruction can comprise as given an order, and when being performed by one or more treatment facility, makes one or more treatment facility determine the position of the user when recording one or more language; And among multiple noise model, select noise model based on the position of user.
Sound signal can correspond to voice search query, and instruction can comprise as given an order, when being performed by one or more treatment facility, make the execution of one or more treatment facility to the speech recognition of filtering audio signals to generate one or more candidate transcription of one or more user spoken utterances; One or more candidate transcription is used to perform search inquiry to generate Search Results; And send Search Results to equipment.
On the other hand, system comprises client device and automated voice recognition system.Client device is configured to send to automated voice recognition system the sound signal at least comprising and corresponding to by the audio user part of one or more user spoken utterances of equipment record.Automated voice recognition system is configured to from client device received audio signal; Access the user speech model be associated with user; Determine that the background audio in sound signal is defining under threshold value; In response to the background audio determined in sound signal under definition threshold value, based on the user speech model of audio signal adaptation access to generate adapt user speech model to the characteristics of speech sounds modeling of user; And use adapt user speech model to perform noise compensation to generate the filtering audio signals compared with the sound signal received with the background audio of minimizing to the sound signal received.
Implementation can comprise following characteristics.Such as, automated voice recognition system can be configured to perform speech recognition to generate one or more candidate transcription of one or more user spoken utterances to filtering audio signals.System can comprise search engine system, and it is configured to use one or more candidate transcription to perform search inquiry to generate Search Results; And send Search Results to client device.
On the other hand, method comprises the sound signal receiving and generated based on the audio frequency input from user by equipment, and sound signal at least comprises the audio user part corresponded to by one or more user spoken utterances of equipment record; Access the user speech model be associated with user; Determine that the background audio in sound signal is defining below threshold value; In response to the background audio determined in sound signal below restriction threshold value, based on the user speech model of audio signal adaptation access to generate adapt user speech model to the characteristics of speech sounds modeling of user; And use adapt user speech model to perform noise compensation to generate the filtering audio signals compared with the sound signal received with the background audio of minimizing to the sound signal received.
The implementation of described technology can comprise the computer software in hardware, method or process or computer accessible.
In accompanying drawing and the details setting forth one or more implementation in hereafter describing.Other features and will become obvious from description, accompanying drawing from claim.
In accompanying drawing and the details setting forth one or more implementation in hereafter describing.Other potential feature, aspect and advantages will become obvious from description, accompanying drawing and claim.
Accompanying drawing explanation
Fig. 1 is the schematic diagram of the example system supporting voice search query.
Fig. 2 is the process flow diagram of the example that process is shown.
Fig. 3 is the process flow diagram of another example that process is shown.
Fig. 4 is swimming lane (swim lane) figure of the example that process is shown.
Embodiment
Fig. 1 shows the schematic diagram of the example of the system 100 supporting voice search query.System 100 comprises search engine 106 and automatic speech recognition (ASR) engine 108, it is connected with one group of mobile device 102a-102c and mobile device 104 by one or more network 110, such as in some embodiments, described one or more network 110 be wireless cellular network, WLAN (wireless local area network) (WLAN) or Wi-Fi network, the third generation (3G) mobile telecom network, dedicated network as Intranet, common network is as the Internet or it is any appropriately combined.
Usually, the user of equipment (such as mobile device 104) can to the microphone oral account search inquiry of mobile device 104.The spoken search query note of user is sound signal by the application run on the mobile devices 104, and sends the part of this sound signal as voice search query to ASR engine 108.After receiving the sound signal corresponding to voice search query, user spoken utterances in sound signal can be translated or be transcribed into one or more text candidates and transcribe by ASR engine 108, and these candidate transcription can be supplied to search engine 106 as query term, thus support the audio search function of mobile device 104.Query term can comprise one or more complete or part of words, character or character string.
Search engine 106 can use search query term to provide Search Results (such as, the Uniform Resource Identifier (URI), image, document, multimedia file etc. of webpage) to mobile device 104.Such as, Search Results can comprise the Uniform Resource Identifier (URI) quoting following resource, and search engine determines that this resource response is in search inquiry.Additionally or alternatively, Search Results can comprise the description or from corresponding resource automatically or other of manual extraction or the extracts of text that is otherwise associated with corresponding resource and so on of such as title, preview image, user's grading, map or direction, corresponding resource.Search engine 106 can comprise in some examples for finding the web search engine of reference in the Internet, being used for finding the phone book type search engine of enterprise or individual or another specialized search engine (such as, the such as amusement inventory such as restaurant and cinema's information, medical treatment and medicine information).
As the example of the operation of system 100, sound signal 138 is included in the voice search query sent from mobile device 104 to ASR engine 108 by network 110.Sound signal 138 comprises language 140 " Gym New York ".ASR engine 108 receives the voice search query comprising sound signal 138.ASR engine 108 audio signal 138 is to generate one or more text candidates of mating with the language detected in sound signal 138 and transcribe or one group of text candidates through rank transcribes 146.Such as, the language in sound signal 138 can produce " Gym New York " and " Jim Newark " alternatively transcribes 146.
The one or more candidate transcription 146 generated by speech recognition system 118 are delivered to search engine 106 by as search query term from ASR engine 108.Search engine 106 provides search query term 146 to generate one or more Search Results to searching algorithm.Search engine 106 provides last set result 152 (such as, the Uniform Resource Identifier (URI), image, document, multimedia file etc. of webpage) to mobile device 104.
Mobile device 104 is display of search results 152 in viewing area.As shown in screenshot capture 158, language " Gym New York " 140 generates three Search Results 160 " Jim Newark " 160a, " New York Fitness " 160b and " Manhattan Body Building " 160c.First Search Results 160a corresponds to candidate transcription Jim Newark, and such as can provide telephone number to user, or mobile device 104 can be used when selected automatically to dial Jim Newark.Latter two Search Results 160b and 160c corresponds to candidate transcription " Gym New York " and comprises webpage URI.Candidate transcription and/or Search Results can carry out rank based on the confidence measurement produced by ASR 108, and this confidence measures the confidence levels that the given candidate transcription of instruction accurately corresponds to the language in sound signal.
Transcribe in order to one or more text candidates is translated or be transcribed into the user spoken utterances in sound signal, ASR engine 108 comprises noise compensation system 116, speech recognition system 118 and stores the database 111 of noise model 112 and user speech model 114.The speech recognition system 118 pairs of sound signals perform speech recognitions and transcribe to identify user spoken utterances in sound signal and these language are translated into one or more text candidates.In some implementation, speech recognition system 118 can generate multiple candidate transcription for given language.Such as, language can be transcribed into multiple item and can assign and transcribe with each of language the confidence levels be associated by speech recognition system 118.
In some implementation, the specific change of speech recognition system 118 can be selected for audio signal based on the additional contextual information relevant with sound signal, and the change selected may be used for the language of transcribing in sound signal.Such as, in some implementation, together with the sound signal comprising user spoken utterances, voice search query can comprise region or the language message of the change for selecting speech recognition system 118.In particular example, the region of registration of mobile devices 104 or the language of mobile device 104 arrange language and can be provided to ASR engine 108 and for ASR engine 108 for determining the language that the user of mobile device 104 is possible or accent wherein.The change of speech recognition system 118 can carry out choice and operation based on the expection language of the user of mobile device 104 or accent.
Noise compensation system 116 can be applied to such as from the sound signal that mobile device 104 receives by ASR engine 108 before execution speech recognition.Noise compensation system 116 can remove or reduce background in sound signal or environment audio frequency to produce filtering audio signals.Because the microphone of mobile device 104 can also capturing ambient audio frequency except the language of user, therefore sound signal may comprise the mixing of user spoken utterances and environment audio frequency.Therefore sound signal can comprise the one or more environmental audio signal only comprising environment audio frequency, and comprises the audio user signal of language (and potential environment audio frequency) of user.Usually, environment audio frequency can comprise generation (nature or other) any ambient sound around user.Environment audio frequency gets rid of the speech of the user of mobile device, language or sound usually.Speech recognition system 118 can perform speech recognition with transcribing user language to the filtering audio signals produced by noise compensation system 116.In some instances, to filtering audio signals perform speech recognition can produce than direct to receive sound signal perform speech recognition transcribe more accurately.
For giving audio signal, one of the noise model 112 stored in noise compensation system 116 usage data storehouse 111 removes with one of user speech model or the background that reduces in sound signal or environment audio frequency.Noise model 112 comprises alternative noise model 120 and adaptive noise model 120b.Similarly, user speech model comprises alternative user speech model 126a and adapt user speech model 126b.Usually, adaptive noise model 120b and adapt user speech model 126b is exclusively used in specific user and is adapted to this user based on by previous talk search inquiry from the sound signal that this user receives.When the specific user for submission current voice search inquiry does not have adaptive noise model or adapt user speech model, use respectively and substitute noise model 120a and alternative user speech model 126a.
In some instances, the performance of noise compensation system 116 can be improved by using adapt user speech model, and this adapt user speech model is by trained or otherwise adapt to the concrete sound characteristic of the specific user submitting voice search query to.But, in order to make speech model adapt to specific user, the sampling of the voice of this user may be needed.In the environment of such as system 100, those samplings may not easily can be used at first.Therefore, in one implementation, if during adapt user speech model when user sends voice search query at first or for some other reasons not for user, ASR 108 selects to substitute user speech model from one or more alternative user speech model 126a.Selected alternative user speech model can be the rationally approximate user speech model of the characteristics of speech sounds being confirmed as user.Selected alternative user speech model is used for performing noise compensation to initial sound signal.Along with user submits voice search query subsequently to, with described those inquire about subsequently together with some or all sound signals that send for by selected alternative user speech model training or adapt to be exclusively used in this user adapt user speech model (namely, characteristics of speech sounds modeling to user), it is for the noise compensation of those sound signals subsequently.
Such as, in one implementation, when receiving sound signal subsequently, ASR 108 determines whether environment or background audio are under specific threshold.If under specific threshold, then this sound signal be used for by alternative user speech model adaptation in or further adapt user speech model is adapted to specific user.If background audio is on threshold value, then sound signal is not used in adapt user speech model and (but may be used for adaptive noise model, as mentioned below).
User speech model (no matter being alternative user speech model 126a or adapt user speech model 126b) such as may be implemented as hidden Markov model (HMM) or gauss hybrid models (GMM).Expectation maximization Algorithm for Training or otherwise adapt user speech model can be used.
In some implementation, user can be positively identified.Such as, some implementation can point out mark the forward direction user accepting search inquiry.Other implementations can use other available information implicit identification users, such as key in the Move Mode (such as, as accelerator forming device a part of) of the pattern of user or user.When user can specifically be identified, adapt user speech model can carry out index by the user identifier corresponding to identifying user.
In other implementations, user may not specifically be identified.In the case, equipment (such as mobile device 104) for typing voice search query can be used as the identifier of particular user, and can based on the device identifier index adapt user speech model for submitting to the equipment of voice search query corresponding.Usually only exist in the environment of single or major equipment user wherein, such as when mobile phone is used as input equipment, based on equipment, develop adapt user speech model can provide acceptable speech model to reach the performance constraints that noise compensation system 116 (particularly) or ASR 108 (more general) force.
Can be improved the same procedure of the performance of noise compensation system 116 by adapt user speech model, the performance of noise compensation system 116 can also have been trained or otherwise adapted to the usual noise model around the environment audio frequency of user by use and be modified.As speech sample, in the environment of such as system 100, the sampling usually around the environment audio frequency of user may not easily can be used at first.Therefore, in one implementation, if during adapt user speech model when user sends voice search query at first or for some other reasons not for user, ASR 108 selects to substitute noise model from one or more alternative noise model 126b.Selected alternative noise model can be the rationally approximate noise model being determined to be in the expectation environment audio frequency around user based on information that is known or that determine.Selected alternative noise model is used for performing noise compensation to initial sound signal.Along with user submits voice search query subsequently to, some or all sound signals that send together with those are inquired about for selected alternative noise model is adapted to be exclusively used in this user adaptive noise model (namely, the characteristic modeling to the typical environment sound around user when submitting search inquiry to), it is for the noise compensation of those sound signals subsequently.
Such as, in one implementation, when receiving sound signal subsequently, ASR 108 determines whether environment or background audio are under specific threshold.If not under specific threshold, then this sound signal is used for being adapted to by alternative noise model or further adaptive noise model being adapted to specific user.In some implementation, no matter whether background audio is on specific threshold, and the sound signal of reception may be used to adaptively substitute noise model or adaptive noise model.
In some implementation; in order to ensure obtaining the sampling without the environment audio frequency of user spoken utterances and this sampling may be used for adaptive noise model, the application of voice search query on mobile device 104 can start before user says search inquiry record and/or can user complete say search inquiry after continue record.Such as, voice search query application can be captured in user and say the audio frequency of before or after search inquiry two seconds to guarantee to obtain the sampling of environment audio frequency.
In some implementation, single alternative noise model can be selected and be adapted to the single adaptive noise model for this user of the varying environment using voice search to apply across user.But in other realize, when using voice search application, the various positions that adaptive noise model often can go for user are developed.Such as, can different noise model be developed for diverse location and be stored as alternative noise model 120a.When submitting voice search query to, the position of user can be sent to ASR 108 by mobile device 104, or the position of user can be determined by other means when submitting voice search query to.When receiving the initial sound signal for given position, then can select the alternative noise model for this position, and when receiving other voice search query from this position, the sound signal be associated may be used for this particular noise model adaptive.This can occur for each position in the diverse location when performing voice search query residing for user, and produce the multiple adaptive noise model for user thus, wherein each model is exclusively used in certain position.After the non-usage time period of definition (such as, user does not perform voice search in this position special time), can delete position particular noise model.
When submitting voice search query to, the position of user, the position be associated with given noise model and the position that is associated with given speech model all can be defined by various granularity level, longitude and latitude navigation coordinate or closely defined the region of (such as, 1/4th miles or less) by navigation coordinate the most specifically.Alternatively, position can use realm identifier to provide, the identifier (such as, " cell/region ABC 123 ") of such as state name or identifier, city name, trivial name (such as, " Central Park "), country name or any defined range.In some implementation, position can locative type, in such as seabeach in some examples, big city, amusement park, mobile traffic, on ship, in buildings, open air, countryside, underground position (such as, subway, parking lot etc.), in the street in the inner or forest of position, high building (skyscraper), instead of geo-specific location.Granularity level and the customer location when submitting voice search query to and the position that given noise model is associated and with can be identical or different between the position that given speech model is associated.
Noise model (no matter being alternative 120a or adaptive 120b) such as may be implemented as hidden Markov model (HMM) or gauss hybrid models (GMM).User speech model can use expectation maximization Algorithm for Training or otherwise adaptive.
As described above, in some implementation, user can by specifically identifying in other implementations equipment can be used as user substitute.Therefore, be similar to the index to speech model, adaptive noise model can carry out index by the user identifier of the user corresponding to the mark when user can specifically be identified, or can by based on corresponding to when user cannot specifically be identified for submitting the device identifier index of the equipment of voice search query to.
Fig. 2 shows the process flow diagram of the example of the process 200 that can perform when receiving initial voice search query from user or equipment, and Fig. 3 shows the process flow diagram of the example of the process 300 that can perform when receiving voice search query subsequently from user or equipment.Be hereafter implementation 200 and process 300 by the component description of system 100, but other assemblies of system 100 or another system also can implementation 200 or process 300.
Initial voice search query (202) is received from equipment (such as mobile device 104) with reference to figure 2, ASR 108.Initial voice search query can be initial, because this voice search query is first voice search query received for particular user or equipment; Because this voice search query is first from submitting to the ad-hoc location of this voice search query to receive; Or (or both) for some other reasons (such as, deleted because this model does not use in special time period) for user or equipment not because adapt user speech model or adaptive noise model.
Voice search query comprises sound signal, and this sound signal comprises audio user signal and environmental audio signal.Audio user signal comprises to be given an oral account to one or more language of the microphone of mobile device 104 and potential environment audio frequency by user.Environmental audio signal only comprises environment audio frequency.As mentioned below, voice search query can also comprise contextual information.
When employed, ASR 108 accesses the contextual information (204) about voice search query.This contextual information such as can provide the instruction of the condition about the sound signal in voice search query.This contextual information can comprise temporal information, date and time information, the data quoting speed or the amount of movement measured by specific mobile device during recording, other device sensor data, device status data (such as, bluetooth headset, speaker-phone or conventional input method) if user selects user identifier when providing or identifies the information of mobile device type or model.
This contextual information can also be included in the position that it submits voice search query to.This position such as can be determined by the schedule of user, from user preference (such as, be stored in the user account of ASR engine 108 or search engine 106) or default location derivation, based on past position (such as, by the equipment for submit Query (such as, mobile device 104) GPS (GPS) module calculate proximal most position), there is provided by user is explicit when submitting voice queries to, determine from language, based on launching tower trigonometric calculations, there is provided (such as by the GPS module in mobile device 104, voice search application can access GPS device to determine position and to send this position with voice search query), or use dead reckoning to estimate.If sent by equipment, then positional information can comprise the accuracy information of the levels of precision of this positional information of instruction.
ASR 108 can use this type of contextual information to help speech recognition, such as, by using contextual information to select the particular variant of speech recognition system or select suitable alternative user speech model or alternative noise model.This type of contextual information can be delivered to search engine 106 to improve Search Results by ASR 108.Some or all contextual informations can receive together with voice search query.
If do not existed for the adapt user speech model of user, then ASR 108 selects initial or alternative user speech model and be associated with user or equipment by this initial user speech model (such as, depending on whether user can specifically be identified) (206).Such as, as described above, ASR 108 can select in some available alternative user speech models.
Selected alternative user speech model can be the rationally approximate user speech model being confirmed as the characteristics of speech sounds of user based on known or comformed information, although this selected alternative user speech model not yet by any sampling of the voice with user adaptation.Such as, in one implementation, two alternative user speech models can be there are: one for male voice one for women's speech.The sex of user can be determined and suitable alternative user speech model (sex) can be selected based on the possible sex of user.The sex of user such as can by analyze the sound signal that receive together with initial voice search query or based on such as by user submit to voluntarily and the information be included in the information in the profile of user determine.
Additionally or alternatively, the adapt user speech model for other users (such as the user of mobile device 102a-102c) can be used as alternative user speech model.When receiving initial voice search query, represent that the expectational model for the user submitting initial searches inquiry to can be determined based on the initial sound signal comprised together with inquiring about with initial searches with the measuring similarity being stored in the similarity between the adapt user speech model in database 111 (corresponding to other users).Such as, if model is based on the linear regression technique of constraint maximum likelihood, then measuring similarity can be the L2 norm (summation for the difference of two squares of each coefficient) of the difference between model.When using GMM technology wherein, measuring similarity can be the Kullback-Leibler entropy between two probability density functions, if or model is GMM and expectational model from single language is spatial point, then may be that the probability density of GMM is positioned at this spatial point.In other implementations using GMM, measuring similarity can be such as each GMM average between distance, or by some norm of covariance matrix normalized average between distance.
Adapt user speech model closest to the expectational model (as shown in by measuring similarity) of user can be selected as the alternative user speech model for the user submitting initial voice search query to.Such as, when the user of equipment 104 submits initial voice search query to, ASR 108 can determine the measuring similarity of the similarity represented between the desired user speech model for the user of equipment 104 and the adapt user speech model of the user for equipment 102a.Similarly, ASR 108 can determine the measuring similarity of the similarity represented between the desired user speech model for the user of equipment 104 and the adapt user speech model of the user for equipment 102b.If measuring similarity pointer is more similar to the model of the user for equipment 102a than the model for the user of equipment 102b to the expectational model of the user of equipment 104, then can be used as the alternative user speech model of the user for equipment 104 for the model of the user of equipment 102a.
As the particular example of the implementation of employing GMM, voice search query can comprise the language comprising voice and ambient signal.This inquiry can be segmented into the segmentation of such as 25ms, and wherein each segmentation is voice or pure environment.For each segmentation, calculate proper vector x t, the vector wherein corresponding to voice is designated as x s.For each potential alternative model M had in a database i, calculate the likelihood score of each vector:
p ( x t , i ) = p ( x t | i ) p ( i ) = Σ j π j N ( x t ; μ i , j , Σ i , j ) p ( i )
This is that the likelihood score of GMM calculates and p (i) is the priori of this alternative model.Suppose the independence of observing, speech vector x sthe probability of set can be expressed as:
p ( x s , i ) = Π s Σ j π j N ( x s ; μ i , j , Σ i , j ) p ( i ) ]
Wherein x sit is the set of speech vector.
Given observation x sthe conditional probability of class i be:
p(i|x s)=p(x s,i)/p(x s)
Wherein
p ( x s ) = Σ i p ( x s , i )
This conditional probability can be used as current utterance and certain alternative speech model M ibetween measuring similarity.
The alternative model with the highest conditional probability can be selected:
model index=ArgMax(p(i|x s))i
Contextual information (accent of such as user or the language of expectation) can be used alone or combinationally use to select alternative user speech model with other technologies mentioned above.Such as, multiple alternative user speech model can store for different language and/or accent.When submitting voice search query to, the position of user can be used for ASR 108 for determining language or the accent of expectation, and the alternative user speech model corresponding to expectation language and/or accent can be selected.Similarly, can be stored in the profile of such as user for the language of user and/or positional information, and correspond to the language of user and/or the alternative user speech model of accent for selecting.
If adapt user speech model is saved as (such as, be original position for ad-hoc location due to voice search query but be not for user or equipment), then action 206 can be skipped, or can substitute by other adaptations with adapt user speech model.Such as, the sound signal received by initial voice search query can be evaluated to determine background audio whether under specific threshold, and if under specific threshold, then this sound signal can be used to further training or this adapt user speech model adaptive by other means.
ASR 108 selects initial or alternative noise model and be associated with user or equipment by this initial noise model (such as, depending on whether user can specifically be identified) (208).The selected noise model that substitutes can be the rationally approximate noise model being confirmed as the expectation environment audio frequency around user based on known or comformed information.Such as, alternative noise model can for the environmental baseline of various criterion kind (such as, in the car, on airport, be in or in bar/dining room) develop.Data from other users in system can be used to develop alternative noise model.Such as, if some duration of low noise data (such as, 10 minutes) is collected by from user, then these data can be used to generate alternative model.When receiving initial sound signal, the measuring similarity that expression expectation noise model and standard substitute the similarity between noise model can be determined based on initial sound signal, and this standard substitutes one of noise model can carry out selecting (such as, use and be similar to above about selecting to substitute the technology described in user model) based on this measuring similarity.Such as, expect that noise model can be determined based on environmental audio signal.Exceed specific dissimilar threshold value (such as, determine based on KL distance) alternative noise model (such as, 100) set can be retained as standard alternative model, and the alternative model used can use as described in measuring similarity select from this set.When selecting to substitute noise model, this can minimization calculation.
Additionally or alternatively, different noise model can be developed for diverse location and is stored as alternative noise model 120a.Such as, the noise model for position A 132a and position B 132b can be developed and be stored as alternative noise model 120a.Noise model for particular location can be developed based on by other Client-initiated previous talk search inquiries in those positions.For position B 132b noise model such as can based on when position B 132b by ASR 108 receive a part for the voice search query as the user from equipment 102b sound signal 130b and at position B 132b time receive a part for the voice search query as the user from equipment 102c by ASR 108 sound signal 130c develop.For position A 132a noise model such as can based at position A by ASR
The 108 sound signal 130a received as a part for the voice search query of the user from equipment 102a develop.
When receiving initial sound signal, alternative noise model can be selected based on the position of user.Such as, when the user of mobile device 104 submits initial voice search to from position B 132b, ASR 108 can select the alternative noise model for position B.In some implementation, the voice search on mobile device 104 applies the GPS that can access on this mobile device to determine the position of user and send positional information to ASR 108 together with voice search query.Positional information can be used for ASR 108 to use to determine suitable alternative noise model based on this position then.In other implementations, when receiving initial sound signal, represent that the measuring similarity of similarity between the distinctive alternative noise model in position expecting to have stored in noise model and database 111 can be determined based on this initial sound signal, and one of distinctive alternative noise model in this position can be selected based on this measuring similarity.
Use initial (or adaptive) user speech model and initial noise model, the noise compensation system 116 of ASR 108 performs noise compensation to remove or to reduce the background audio in sound signal to the sound signal received together with voice search query, produces filtering audio signals (210) thus.Such as, at such as ALGONQUIN:Iterating Laplace ' s Methodto Remove Multiple Types of Acoustic Distortion for Robust Speech Recognition, the algorithm of such as Algonquin algorithm described in Eurospeech 2001-Scandinavia and so on may be used for using initial user speech model and initial noise model to perform noise compensation.
Speech recognition system performs speech recognition so that the language in sound signal is transcribed into one or more candidate transcription (210) to filtering audio signals.Search inquiry can use one or more candidate transcription to perform.In some implementation, ASR 108 can use contextual information to select the particular variant of the speech recognition system for performing speech recognition.Such as, the accent of user and/or expectation or known language may be used for selecting suitable speech recognition system.When submitting voice search query to, the position of user may be used for the expectation language determining user, or the language of user can be included in the profile of this user.
With reference to figure 3, ASR 108 from equipment (such as mobile device 104) reception voice search query (302) subsequently.This voice search query subsequently can be subsequently, this is because this voice search query receives after the previous talk search inquiry for particular user or equipment, or because there is substituting or adapt user speech model or noise model for user or equipment.
Voice search query subsequently comprises sound signal, and this sound signal comprises audio user signal and environmental audio signal.Audio user signal comprises to be given an oral account to the one or more language in the microphone of mobile device 104 and potential environment audio frequency by user.Environmental audio signal only comprises environment audio frequency.As mentioned below, voice search query can also comprise contextual information.
When employed, ASR 108 accesses the contextual information (304) about voice search query.ASR 108 can use this type of contextual information to help speech recognition, such as, by the particular variant using this contextual information to select speech recognition system.Additionally or alternatively, contextual information may be used for helping substituting or the selection of adapt user speech model and/or adaptive or alternative noise model and/or adaptation.ASR 108 can transmit this type of contextual information to improve Search Results to search engine 106.Some or all contextual informations can receive together with voice search query.
ASR 108 determines in the sound signal received together with voice search query, whether environment audio frequency is defining under threshold value (306).Such as, speech activity detector may be used for determining the audio user signal in the sound signal of reception and environmental audio signal.ASR 108 then can determine energy in environmental audio signal and the energy this determined and threshold energy compare.If this energy is under described threshold energy, then environment audio frequency is considered under definition threshold value.In another example, ASR 108 can determine the energy in audio user signal, determines the energy in environmental audio signal, and then determines the ratio of the energy in audio user signal and the energy in environmental audio signal.This ratio can represent the signal to noise ratio (S/N ratio) (SNR) of sound signal.The SNR of sound signal then can compared with threshold value SNR, and when the SNR of sound signal is on threshold value SNR, environment audio frequency is considered under definition threshold value.
If the environment audio frequency in the sound signal received together with voice search query is not under definition threshold value, then this audio signal adaptation is used to substitute (or adaptive) noise model to generate adaptive noise model (312).In some implementation, treat that adaptive particular noise model is selected based on the position of user.Such as, when different noise model frequently submits the diverse location of voice search query for user to from it, ASR 108 can use the position of user or equipment to select substituting or adaptive noise model for this position.
Noise model can be adaptive in whole sound signal, or environmental audio signal can be extracted and for adaptive noise model, depends on the specific implementation mode of noise model and speech enhan-cement or Speech separation algorithm.The technology of such as hidden Markov model or gauss hybrid models and so on may be used for realizing user speech model, and the technology of such as expectation maximization and so on may be used for adapt user speech model.
If the environment audio frequency in the sound signal received together with voice search query is under definition threshold value, then this sound signal is used for alternative user speech model (if this substitutes previously not yet adapted to adapt user speech model) or the adapt user speech model (308) of adaptive previously selection.User speech model can be adaptive in whole sound signal, or audio user signal can be extracted and for adapt user speech model, depend on the specific implementation mode of user speech model.Be similar to noise model, the technology of such as hidden Markov model or gauss hybrid models and so on may be used for realizing user speech model, and the technology of such as expectation maximization or maximum a posteriori (MAP) adaptation and so on may be used for adapt user speech model.
In some implementation, ASR 108 is also based on the sound signal training under threshold value of wherein background audio or otherwise adaptively substitute noise model or adaptive noise model (310).Although in some implementation, user speech model only uses the wherein sound signal training or adaptive of background audio under definition threshold value, but in some instances, noise model can based on this type of sound signal and the wherein sound signal training or adaptive of background audio on threshold value, and this depends on the particular technology for realizing noise model.Such as, some noise model can comprise reflect the wherein environment of background audio under threshold value in parameter, and therefore this class model can be benefited from the adaptation wherein sound signal of background audio under threshold value.
Use and substitute or adapt user speech model (depending on whether alternative speech model is adapted) and alternative or adaptive noise model (depending on whether alternative noise model is adapted), the noise compensation system 116 of ASR 108 performs noise compensation to remove or to reduce the background audio in sound signal with the sound signal that mode identical as described above pair receives together with voice search query, thus produces filtering audio signals (314).Speech recognition system performs speech recognition so that the speech in sound signal is transcribed into one or more candidate transcription (316) in mode identical as described above to filtering audio signals.
Although process 300 illustrates adaptive noise model and/or user speech model before for noise compensation, but adaptation can occur after execution noise compensation, and noise compensation can based on noise and/or user speech model by the noise before further adaptation and/or user speech model.This can be following situation, such as, when adaptation is computation-intensive.In the case, to Expected Response time of voice search query can by use for the current noise of noise compensation and user speech model and based on sound signal new afterwards to its realization of more newly arriving.
Fig. 4 shows the swimming lane figure of the example of the process 400 performed by mobile device 104, ASR 108 and the search engine 106 for the treatment of voice search query.Mobile device 104 sends voice search query (402) to ASR 108.As described above, voice search query comprises the sound signal comprising environmental audio signal and audio user signal, environmental audio signal comprises the environment audio frequency without user spoken utterances, and audio user signal comprises user spoken utterances (and potentially environment audio frequency).Voice search query can also comprise contextual information, all contextual informations as described above.
ASR 108 receives voice search query (402) and selects both noise model and user speech model (404).ASR 108 such as can based on the adapt user speech model comprising or select the addressable user identifier of ASR 108 or device identifier by other means storage together with voice search query.Similarly, ASR 108 such as can based on the adaptive noise model comprising or select the addressable user identifier of ASR 108 or device identifier by other means storage together with voice search query.Using in implementation for the different noise models of particular location, ASR 108 can select the adaptive noise model of storage from the peculiar adaptive noise model in multiple position based on user or device identifier and the location identifier of position corresponding to the user when submitting voice search query to.ASR 108 can from send voice search query or by other means to ASR 108 can contextual information in find out positional information.
Do not exist in the event of adapt user speech model for user or equipment, ASR 108 such as uses technology mentioned above to select alternative user speech model (404).Similarly, if there is not adaptive noise model for user or equipment, or at least not for the ad-hoc location of the user when submitting voice search query to, then ASR 108 such as uses technology mentioned above to select alternative noise model.
ASR 108 uses the next adaptive selected audio user model (406) of the sound signal received together with voice search query and/or selected noise model (408) to generate adapt user speech model or adaptive noise model then, and this depends on the background audio in sound signal.As described above, in background audio when defining under threshold value, sound signal is used for the user speech model selected by adaptation, and for the noise model selected by adaptation in some implementation.In background audio when defining on threshold value, then at least in some implementation, noise signal is used for the noise model only selected by adaptation.
ASR 108 uses adapt user speech model and adaptive noise model to perform noise compensation (410) to generate the filtering audio signals having reduced or removed background audio compared with the sound signal received to sound signal.
ASR engine 404 pairs of filtering audio signals perform speech recognition 416 and transcribe (412) so that the one or more language in sound signal are transcribed into text candidates.ASR engine 404 forwards transcribing (414) of 418 generations to search engine 406.If ASR engine 404 generates multiple transcribing, then can be that ordered pair transcribes sequence alternatively with degree of confidence.ASR engine 404 can provide context data to search engine 406 alternatively, such as geographic position, and search engine 406 can use this context data filter Search Results or sort.
Search engine 406 uses and transcribes to perform search operation (416).Search engine 406 can be located one or more URI relevant with transcribing item.
Search engine 406 provides search query results (418) to mobile device 402.Such as, search engine 406 can forward following HTML code, the visual inventory of the URI of this code building location.
Describe multiple implementation.But, will understand, and can various amendment be carried out and not depart from Spirit Essence and the scope of disclosure.Such as, above technology is described about performing speech recognition to the sound signal in voice search query, and this technology may be used for other system, such as in the computerize speech dictation system moved or other equipment realize or conversational system.In addition, can resequencing, add or removal step time use above shown in the various forms of flow process.Thus, other implementations within the scope of the appended claims.
The embodiment that describes in this instructions and all functions operation can be realized in Fundamental Digital Circuit or in one that is included in the computer software of structure disclosed in this instructions and structural equivalents thereof, firmware or hardware or in them or multinomial combination.Embodiment may be implemented as one or more computer program, namely encode on a computer-readable medium for being performed by data processing equipment or one or more module of computer program instructions of operation of control data treating apparatus.Computer-readable medium can be machine readable storage device, machine readable storage substrate, memory devices, realize the material composition of machine readable transmitting signal or or multinomial combination in them.Term " data processing equipment " covers for the treatment of all devices of data, equipment and machine, such as, comprise a programmable processor, a computing machine or multiple processor or computing machine.The computer program that device can also comprise for discussing except comprising hardware creates the code of execution environment, such as, form the code of processor firmware, protocol stack, data base management system (DBMS), operating system or in them or multinomial combination.Transmitting signal is the artificial signal generated, and such as, the electricity generated by machine, optics or electromagnetic signal, this signal is generated and sends for suitable acceptor device for encoding to information.
Computer program (also referred to as program, software, software application, script or code) can be write with any type of programming language comprising compiling or interpretative code, and can be disposed it by any form, comprise as stand-alone program or as the module being suitable for using in a computing environment, parts, subroutine or other unit.Computer program not necessarily corresponds to the file in file system.Program can be stored in the part of the file keeping other program or data (such as, be stored in one or more script in marking language document), in the Single document of program being exclusively used in discussion or in multiple coordinated files (such as, storing the file of one or more module, subroutine or code section).Computer program can be deployed on a computer or be positioned at the three unities or be distributed in multiple place and perform by multiple computing machines of interconnection of telecommunication network.
The process described in this manual and logic flow can be performed by one or more programmable processor, and this processor performs one or more computer program with by generating output and carry out n-back test input data manipulation.Process and logic flow also can be performed by dedicated logic circuit such as FPGA (field programmable gate array) or ASIC (special IC), and device also can be implemented as this dedicated logic circuit.
Be suitable for performing any one or multiple processor that the processor of computer program such as comprises the digital machine of general and special microprocessor and any kind.Generally speaking, processor will from ROM (read-only memory) or random access memory or both receive instruction and data.The elementary cell of computing machine is the processor for performing instruction and one or more memory devices for storing instruction and data.Generally speaking, computing machine also by one or more mass memory unit comprised for storing data (such as, disk, photomagneto disk or CD) or be operatively coupled into from this mass memory unit receive data or to this mass memory unit transmit data or both.But computing machine is without the need to having such equipment.In addition, computing machine can be embedded in another equipment, only gives a few examples, and this another equipment is such as flat computer, mobile phone, personal digital assistant (PDA), Mobile audio player, GPS (GPS) receiver.The computer-readable medium being suitable for storing computer program instructions and data comprises nonvolatile memory, medium and the memory devices of form of ownership, such as comprises semiconductor memory devices (such as, EPROM, EEPROM and flash memory device); Disk (such as, internal hard drive or removable disk); Magneto-optic disk; And CD ROM and DVD-ROM dish.Processor and storer by supplemented or can be incorporated in dedicated logic circuit.
Mutual in order to what provide with user, embodiment can be limited on computing machine in fact, this computing machine has for showing the display apparatus of information (such as to user, CRT (cathode-ray tube (CRT)) or LCD (liquid crystal display) monitor) and user can be used for providing to computing machine keyboard and the indication equipment (such as, mouse or tracking ball) of input.It is mutual that the equipment of other kind also can be used to provide with user; Such as, the feedback provided to user can be any type of sensory feedback (such as, visual feedback, audio feedback or tactile feedback); And can with comprising sound, any form of voice or sense of touch input receives input from user.
Embodiment can be implemented in computing system, this computing system comprises back-end component (such as, as data server) or comprise middleware component (such as, application server) or comprise any combination of one or more parts in front end component (such as, there is user can be used for carrying out mutual graphic user interface or the client computer of Web browser with implementation) or such rear end, middleware or front end component.The parts of system can be interconnected by any digital data communication form or medium (such as, communication network).The example of communication network comprises LAN (Local Area Network) (" LAN ") and wide area network (" WAN "), such as, and the Internet.
Computing system can comprise client and server.Client and server generally mutual away from and usually mutual by communication network.The relation computer program of client and server occurs, and these computer programs run and mutually have client-server relation on corresponding computer.
Although this instructions comprises many details, these should not be construed as to scope of the disclosure or can be claimed the restriction of scope of content, and should as description specific implementation being realized to distinctive feature.Some feature that also can describe in the context of independent embodiment in single this instructions of embodiment combination enforcement.Otherwise, also can in multiple embodiment separately or in any suitable sub-portfolio, implement the various feature that describes in the context of single embodiment.In addition; although can describe feature as above in some embodiments effect and even originally claimed like this; but one or more feature can removed from claimed combination in some cases from this combination, and claimed combination can relate to the variant of sub-portfolio or sub-portfolio.
Similarly, although describe operation with particular order in the accompanying drawings, this should not be construed as and requires with shown particular order or perform such operation with sequence order or perform all shown operations to realize the result of wishing.In some circumstances, multitask and parallel processing can be favourable.In addition, in above-described embodiment, be separated various system unit should not be construed as and require such separation in all embodiments, and should be appreciated that the program element of description and system generally can together be integrated in single software product or be encapsulated in multiple software product.
Mention in each example of html file wherein, other file type or form can be replaced with.Such as, html file can replace with the file of XML, JSON, plaintext or other type.In addition, when mentioning table or hash table, other data structure (such as spreadsheet, relational database or structured document) can be used.
Therefore, particular implementation is described.Other embodiment within the scope of the appended claims.Such as, the action recorded in the claims can perform by different order and still obtain the result of hope.

Claims (24)

1., for a system for speech recognition, comprising:
For receiving the device of the sound signal generated based on the audio frequency input from user by equipment, described sound signal at least comprises the audio user part corresponded to by one or more user spoken utterances of described equipment record;
For accessing the device of the user speech model be associated with described user;
For determining the device of background audio below definition threshold value in described sound signal;
For in response to the described background audio determined in described sound signal below described definition threshold value, the user speech model of accessing based on described audio signal adaptation is to generate the device to the adapt user speech model of the characteristics of speech sounds modeling of described user; And
Voice use described adapt user speech model to perform noise compensation to generate the device compared with the sound signal of described reception with the filtering audio signals of the background audio of minimizing to the sound signal received.
2. system according to claim 1, wherein said sound signal comprises the environment audio-frequency unit only corresponded to around the background audio of described user, and in order to determine that the described background audio in described sound signal is defining under threshold value, described system comprises:
For determining the device of the amount of the energy in described environment audio-frequency unit; And
For determining the device of amount under threshold energy of the described energy in described environment audio-frequency unit.
3. system according to claim 2, in order to determine that the described background audio in described sound signal is defining under threshold value, described system comprises:
For determining the device of the signal to noise ratio (S/N ratio) of described sound signal; And
For determining the device of described signal to noise ratio (S/N ratio) under threshold signal-to-noise ratio.
4. system according to claim 3, wherein said sound signal comprises the environment audio-frequency unit only corresponded to around the background audio of described user, and in order to determine the described signal to noise ratio (S/N ratio) of described sound signal, described system comprises:
For determining the device of the amount of the energy in the described audio user part of described sound signal;
For determining the device of the amount of the energy in the described environment audio-frequency unit of described sound signal; And
The device of described signal to noise ratio (S/N ratio) is determined for the ratio between the amount by determining the energy in described audio user part and described environment audio-frequency unit.
5. system according to claim 1, the user speech model of wherein accessing comprises the alternative user speech model of the described characteristics of speech sounds modeling be not adapted to be described user.
6. system according to claim 5, wherein said system comprises:
For selecting the device of described alternative user speech model; And
For the device that described alternative speech model and described user are carried out associating.
7. system according to claim 6, wherein in order to select described alternative user speech model, described system comprises:
For determining the device of the sex of described user; And
Among multiple alternative user speech model, the device of described alternative user speech model is selected for the described sex based on described user.
8. system according to claim 6, wherein in order to select described alternative user speech model, described system comprises:
For determining the device of the position of the described user when recording described one or more language; And
Among multiple alternative user speech model, the device of described alternative user speech model is selected for the described position based on user described when recording described one or more language.
9. system according to claim 6, in order to select described alternative user speech model, described system comprises:
For the device of the language or accent of determining described user; And
For selecting the device of described alternative user speech model among multiple alternative user speech model based on described language or accent.
10. system according to claim 6, wherein in order to select described alternative user speech model, described system comprises:
For receiving the device at least comprising and corresponding to by the initial sound signal of the initial user audio-frequency unit of one or more user spoken utterances of described equipment record;
For the described user that determines multiple alternative user speech model and determine based on described initial sound signal desired user speech model between the device of similarity measurement; And
For selecting the device of described alternative user speech model among described multiple alternative user speech model based on described similarity measurement.
11. systems according to claim 1, wherein said system comprises:
For accessing the device of the noise model be associated with described user; And
Wherein in order to perform noise compensation, described system comprises further for using described adapt user speech model and access noise model to the device of the sound signal execution noise compensation received.
12. systems according to claim 11, wherein in order to perform noise compensation, described system comprises further:
For accessing noise model based on the audio signal adaptation received to generate the device to the adaptive noise model of the characteristic modeling of the background audio around described user; And
Carry out to perform the sound signal received the device of noise compensation for using described adapt user speech model and described adaptive noise model.
13. systems according to claim 11, wherein said system comprises:
For receiving the device at least comprising and corresponding to by the second sound signal of the second audio user part of one or more user spoken utterances of described equipment record;
For determining the device of background audio on definition threshold value in described second sound signal; And
For in response to the described background audio determined in described second sound signal on described definition threshold value, the described noise model be associated with described user based on described second audio signal adaptation is to generate the device of the adaptive noise model of the characteristic modeling to the background audio around described user.
14. systems according to claim ll, wherein said access noise model comprises the alternative noise model of the characteristic modeling be not yet adapted to be the background audio around described user.
15. systems according to claim 14, wherein said system comprises:
For selecting the device of described alternative noise model; And
For the device that described alternative noise model and described user are carried out associating.
16. systems according to claim 15, wherein in order to select described alternative noise model, described system comprises:
For receiving the device at least comprising and corresponding to by the initial sound signal of the initial user audio-frequency unit of one or more user spoken utterances of described equipment record;
For determining the device of the position of the described user when recording the described one or more language corresponding to described initial user audio-frequency unit; And
For selecting the device of described alternative noise model among multiple alternative noise model based on the described position of the described user when recording the described one or more language corresponding to described initial user audio-frequency unit.
17. systems according to claim 15, wherein in order to select described alternative noise model, described system comprises:
For receiving the device at least comprising and corresponding to by the initial sound signal of the initial user audio-frequency unit of one or more user spoken utterances of described equipment record;
For the described user that determines multiple alternative noise model and determine based on described initial sound signal expectation noise model between the device of similarity measurement; And
For selecting the device of described alternative noise model among described multiple alternative noise model based on described similarity measurement.
18. systems according to claim 17, each alternative noise model in wherein said multiple alternative noise model is to the characteristic modeling of the background audio in ad-hoc location.
19. systems according to claim 17, each alternative noise model in wherein said multiple alternative noise model is to the characteristic modeling of the background audio in the environmental baseline of particular types.
20. systems according to claim 11, wherein in order to access described noise model, described system comprises:
For determining the device of the position of the described user when recording described one or more language; And
Among multiple noise model, the device of described noise model is selected for the described position based on described user.
21. systems according to claim 1, wherein said sound signal corresponds to voice search query, and described system comprises:
For performing speech recognition to generate the device of one or more candidate transcription of described one or more user spoken utterances to described filtering audio signals;
Search inquiry is performed to generate the device of Search Results for using described one or more candidate transcription; And
For sending the device of described Search Results to described equipment.
22. 1 kinds, for the system of speech recognition, comprising:
For sending the device of the sound signal of the audio user part at least comprising the one or more user spoken utterances corresponding to record to automated voice recognition system;
For receiving the device of described sound signal;
For accessing the device of the user speech model be associated with described user;
For determining the device of background audio under definition threshold value in described sound signal;
For in response to the described background audio determined in described sound signal under described definition threshold value, the user speech model of accessing based on described audio signal adaptation is to generate the device to the adapt user speech model of the characteristics of speech sounds modeling of described user; And
For using described adapt user speech model, noise compensation is performed to generate the device compared with the sound signal of described reception with the filtering audio signals of the background audio of minimizing to the sound signal received.
23. systems according to claim 22, wherein said system comprises further for performing speech recognition to generate the device of one or more candidate transcription of described one or more user spoken utterances to described filtering audio signals, and described system comprises further:
Search inquiry is performed to generate the device of Search Results for using described one or more candidate transcription; And
For sending the device of described Search Results.
24. 1 kinds, for the method for speech recognition, comprising:
Receive the sound signal generated based on the audio frequency input from user by equipment, described sound signal at least comprises the audio user part corresponded to by one or more user spoken utterances of described equipment record;
Access the user speech model be associated with described user;
Determine that the background audio in described sound signal is defining below threshold value;
In response to the described background audio determined in described sound signal below definition threshold value, the user speech model of accessing based on described audio signal adaptation is to generate the adapt user speech model to the characteristics of speech sounds modeling of described user; And
Described adapt user speech model is used to perform noise compensation to generate the filtering audio signals compared with the sound signal received with the background audio of minimizing to the sound signal of described reception.
CN201180026390.4A 2010-06-14 2011-06-13 Speech and noise models for speech recognition Active CN103069480B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US12/814,665 2010-06-14
US12/814,665 US8234111B2 (en) 2010-06-14 2010-06-14 Speech and noise models for speech recognition
PCT/US2011/040225 WO2011159628A1 (en) 2010-06-14 2011-06-13 Speech and noise models for speech recognition

Publications (2)

Publication Number Publication Date
CN103069480A CN103069480A (en) 2013-04-24
CN103069480B true CN103069480B (en) 2014-12-24

Family

ID=44303537

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201180026390.4A Active CN103069480B (en) 2010-06-14 2011-06-13 Speech and noise models for speech recognition

Country Status (5)

Country Link
US (3) US8234111B2 (en)
EP (1) EP2580751B1 (en)
CN (1) CN103069480B (en)
AU (1) AU2011267982B2 (en)
WO (1) WO2011159628A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105719645A (en) * 2014-12-17 2016-06-29 现代自动车株式会社 Speech recognition apparatus, vehicle including the same, and method of controlling the same

Families Citing this family (337)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001013255A2 (en) * 1999-08-13 2001-02-22 Pixo, Inc. Displaying and traversing links in character array
US8645137B2 (en) 2000-03-16 2014-02-04 Apple Inc. Fast, language-independent method for user authentication by voice
ITFI20010199A1 (en) 2001-10-22 2003-04-22 Riccardo Vieri SYSTEM AND METHOD TO TRANSFORM TEXTUAL COMMUNICATIONS INTO VOICE AND SEND THEM WITH AN INTERNET CONNECTION TO ANY TELEPHONE SYSTEM
US7669134B1 (en) 2003-05-02 2010-02-23 Apple Inc. Method and apparatus for displaying information during an instant messaging session
US8677377B2 (en) 2005-09-08 2014-03-18 Apple Inc. Method and apparatus for building an intelligent automated assistant
US7633076B2 (en) 2005-09-30 2009-12-15 Apple Inc. Automated response to and sensing of user activity in portable devices
US9318108B2 (en) 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
US20080129520A1 (en) * 2006-12-01 2008-06-05 Apple Computer, Inc. Electronic device with enhanced audio feedback
US7912828B2 (en) * 2007-02-23 2011-03-22 Apple Inc. Pattern searching methods and apparatuses
US8977255B2 (en) * 2007-04-03 2015-03-10 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
ITFI20070177A1 (en) 2007-07-26 2009-01-27 Riccardo Vieri SYSTEM FOR THE CREATION AND SETTING OF AN ADVERTISING CAMPAIGN DERIVING FROM THE INSERTION OF ADVERTISING MESSAGES WITHIN AN EXCHANGE OF MESSAGES AND METHOD FOR ITS FUNCTIONING.
US9053089B2 (en) * 2007-10-02 2015-06-09 Apple Inc. Part-of-speech tagging using latent analogy
US8595642B1 (en) 2007-10-04 2013-11-26 Great Northern Research, LLC Multiple shell multi faceted graphical user interface
US8165886B1 (en) 2007-10-04 2012-04-24 Great Northern Research LLC Speech interface system and method for control and interaction with applications on a computing system
US8364694B2 (en) 2007-10-26 2013-01-29 Apple Inc. Search assistant for digital media assets
US8620662B2 (en) 2007-11-20 2013-12-31 Apple Inc. Context-aware unit selection
US10002189B2 (en) 2007-12-20 2018-06-19 Apple Inc. Method and apparatus for searching using an active ontology
US9330720B2 (en) * 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US8327272B2 (en) 2008-01-06 2012-12-04 Apple Inc. Portable multifunction device, method, and graphical user interface for viewing and managing electronic calendars
US8065143B2 (en) 2008-02-22 2011-11-22 Apple Inc. Providing text input using speech data and non-speech data
US8289283B2 (en) * 2008-03-04 2012-10-16 Apple Inc. Language input interface on a device
US8996376B2 (en) 2008-04-05 2015-03-31 Apple Inc. Intelligent text-to-speech conversion
US10496753B2 (en) 2010-01-18 2019-12-03 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US8464150B2 (en) 2008-06-07 2013-06-11 Apple Inc. Automatic language identification for dynamic text processing
US20100030549A1 (en) 2008-07-31 2010-02-04 Lee Michael M Mobile device having human language translation capability with positional feedback
US8768702B2 (en) 2008-09-05 2014-07-01 Apple Inc. Multi-tiered voice feedback in an electronic device
US8898568B2 (en) * 2008-09-09 2014-11-25 Apple Inc. Audio user interface
US20100082328A1 (en) * 2008-09-29 2010-04-01 Apple Inc. Systems and methods for speech preprocessing in text to speech synthesis
US8352268B2 (en) * 2008-09-29 2013-01-08 Apple Inc. Systems and methods for selective rate of speech and speech preferences for text to speech synthesis
US8352272B2 (en) * 2008-09-29 2013-01-08 Apple Inc. Systems and methods for text to speech synthesis
US8396714B2 (en) * 2008-09-29 2013-03-12 Apple Inc. Systems and methods for concatenation of words in text to speech synthesis
US8712776B2 (en) * 2008-09-29 2014-04-29 Apple Inc. Systems and methods for selective text to speech synthesis
US8583418B2 (en) 2008-09-29 2013-11-12 Apple Inc. Systems and methods of detecting language and natural language strings for text to speech synthesis
US8355919B2 (en) * 2008-09-29 2013-01-15 Apple Inc. Systems and methods for text normalization for text to speech synthesis
US8676904B2 (en) 2008-10-02 2014-03-18 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US9959870B2 (en) 2008-12-11 2018-05-01 Apple Inc. Speech recognition involving a mobile device
US8862252B2 (en) 2009-01-30 2014-10-14 Apple Inc. Audio user interface for displayless electronic device
US8380507B2 (en) 2009-03-09 2013-02-19 Apple Inc. Systems and methods for determining the language to use for speech generated by a text to speech engine
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US10255566B2 (en) 2011-06-03 2019-04-09 Apple Inc. Generating and processing task items that represent tasks to perform
US9858925B2 (en) 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US10540976B2 (en) 2009-06-05 2020-01-21 Apple Inc. Contextual voice commands
US9431006B2 (en) 2009-07-02 2016-08-30 Apple Inc. Methods and apparatuses for automatic speech recognition
US20110010179A1 (en) * 2009-07-13 2011-01-13 Naik Devang K Voice synthesis and processing
US20110066438A1 (en) * 2009-09-15 2011-03-17 Apple Inc. Contextual voiceover
US8682649B2 (en) * 2009-11-12 2014-03-25 Apple Inc. Sentiment prediction from textual data
US8600743B2 (en) * 2010-01-06 2013-12-03 Apple Inc. Noise profile determination for voice-related feature
US20110167350A1 (en) * 2010-01-06 2011-07-07 Apple Inc. Assist Features For Content Display Device
US8381107B2 (en) 2010-01-13 2013-02-19 Apple Inc. Adaptive audio feedback system and method
US8311838B2 (en) * 2010-01-13 2012-11-13 Apple Inc. Devices and methods for identifying a prompt corresponding to a voice input in a sequence of prompts
US10705794B2 (en) 2010-01-18 2020-07-07 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10679605B2 (en) 2010-01-18 2020-06-09 Apple Inc. Hands-free list-reading by intelligent automated assistant
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US10553209B2 (en) 2010-01-18 2020-02-04 Apple Inc. Systems and methods for hands-free notification summaries
US9058732B2 (en) * 2010-02-25 2015-06-16 Qualcomm Incorporated Method and apparatus for enhanced indoor position location with assisted user profiles
US8682667B2 (en) 2010-02-25 2014-03-25 Apple Inc. User profiling for selecting user specific voice input processing information
US8639516B2 (en) * 2010-06-04 2014-01-28 Apple Inc. User-specific noise suppression for voice quality improvements
US8713021B2 (en) 2010-07-07 2014-04-29 Apple Inc. Unsupervised document clustering using latent semantic density analysis
US9104670B2 (en) 2010-07-21 2015-08-11 Apple Inc. Customized search or acquisition of digital media assets
US8521526B1 (en) 2010-07-28 2013-08-27 Google Inc. Disambiguation of a spoken query term
WO2012020394A2 (en) * 2010-08-11 2012-02-16 Bone Tone Communications Ltd. Background sound removal for privacy and personalization use
US8719006B2 (en) 2010-08-27 2014-05-06 Apple Inc. Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis
US8719014B2 (en) 2010-09-27 2014-05-06 Apple Inc. Electronic device with text error correction based on voice recognition data
KR20120054845A (en) * 2010-11-22 2012-05-31 삼성전자주식회사 Speech recognition method for robot
US10515147B2 (en) 2010-12-22 2019-12-24 Apple Inc. Using statistical language models for contextual lookup
US10762293B2 (en) 2010-12-22 2020-09-01 Apple Inc. Using parts-of-speech tagging and named entity recognition for spelling correction
US9538286B2 (en) * 2011-02-10 2017-01-03 Dolby International Ab Spatial adaptation in multi-microphone sound capture
US8781836B2 (en) 2011-02-22 2014-07-15 Apple Inc. Hearing assistance system for providing consistent human speech
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
AU2012236649A1 (en) * 2011-03-28 2013-10-31 Ambientz Methods and systems for searching utilizing acoustical context
US10672399B2 (en) 2011-06-03 2020-06-02 Apple Inc. Switching between text data and audio data based on a mapping
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US8812294B2 (en) 2011-06-21 2014-08-19 Apple Inc. Translating phrases from one language into another using an order-based set of declarative rules
GB2493413B (en) * 2011-07-25 2013-12-25 Ibm Maintaining and supplying speech models
TWI442384B (en) * 2011-07-26 2014-06-21 Ind Tech Res Inst Microphone-array-based speech recognition system and method
US8595015B2 (en) * 2011-08-08 2013-11-26 Verizon New Jersey Inc. Audio communication assessment
US8706472B2 (en) 2011-08-11 2014-04-22 Apple Inc. Method for disambiguating multiple readings in language conversion
US8994660B2 (en) 2011-08-29 2015-03-31 Apple Inc. Text correction processing
US8762156B2 (en) 2011-09-28 2014-06-24 Apple Inc. Speech recognition repair using contextual information
US8712184B1 (en) * 2011-12-05 2014-04-29 Hermes Microvision, Inc. Method and system for filtering noises in an image scanned by charged particles
US10134385B2 (en) 2012-03-02 2018-11-20 Apple Inc. Systems and methods for name pronunciation
US9483461B2 (en) 2012-03-06 2016-11-01 Apple Inc. Handling speech synthesis of content for multiple languages
US9280610B2 (en) 2012-05-14 2016-03-08 Apple Inc. Crowd sourcing information to fulfill user requests
US8775442B2 (en) 2012-05-15 2014-07-08 Apple Inc. Semantic search using a single-source semantic model
US10417037B2 (en) 2012-05-15 2019-09-17 Apple Inc. Systems and methods for integrating third party services with a digital assistant
US11023520B1 (en) 2012-06-01 2021-06-01 Google Llc Background audio identification for query disambiguation
US9123338B1 (en) 2012-06-01 2015-09-01 Google Inc. Background audio identification for speech disambiguation
WO2013185109A2 (en) 2012-06-08 2013-12-12 Apple Inc. Systems and methods for recognizing textual identifiers within a plurality of words
US9721563B2 (en) 2012-06-08 2017-08-01 Apple Inc. Name recognition system
US9489940B2 (en) * 2012-06-11 2016-11-08 Nvoq Incorporated Apparatus and methods to update a language model in a speech recognition system
US9384737B2 (en) * 2012-06-29 2016-07-05 Microsoft Technology Licensing, Llc Method and device for adjusting sound levels of sources based on sound source priority
US9495129B2 (en) 2012-06-29 2016-11-15 Apple Inc. Device, method, and user interface for voice-activated navigation and browsing of a document
CN102841932A (en) * 2012-08-06 2012-12-26 河海大学 Content-based voice frequency semantic feature similarity comparative method
KR20150046100A (en) 2012-08-10 2015-04-29 뉘앙스 커뮤니케이션즈, 인코포레이티드 Virtual agent communication for electronic devices
US9576574B2 (en) 2012-09-10 2017-02-21 Apple Inc. Context-sensitive handling of interruptions by intelligent digital assistant
US20140074466A1 (en) 2012-09-10 2014-03-13 Google Inc. Answering questions using environmental context
US9547647B2 (en) 2012-09-19 2017-01-17 Apple Inc. Voice-based media searching
US8935167B2 (en) 2012-09-25 2015-01-13 Apple Inc. Exemplar-based latent perceptual modeling for automatic speech recognition
US9319816B1 (en) * 2012-09-26 2016-04-19 Amazon Technologies, Inc. Characterizing environment using ultrasound pilot tones
US9190057B2 (en) * 2012-12-12 2015-11-17 Amazon Technologies, Inc. Speech model retrieval in distributed speech recognition systems
US9653070B2 (en) 2012-12-31 2017-05-16 Intel Corporation Flexible architecture for acoustic signal processing engine
US8494853B1 (en) * 2013-01-04 2013-07-23 Google Inc. Methods and systems for providing speech recognition systems based on speech recordings logs
CN103971680B (en) * 2013-01-24 2018-06-05 华为终端(东莞)有限公司 A kind of method, apparatus of speech recognition
CN103065631B (en) * 2013-01-24 2015-07-29 华为终端有限公司 A kind of method of speech recognition, device
KR20230137475A (en) 2013-02-07 2023-10-04 애플 인크. Voice trigger for a digital assistant
US9460715B2 (en) * 2013-03-04 2016-10-04 Amazon Technologies, Inc. Identification using audio signatures and additional characteristics
US20140278395A1 (en) * 2013-03-12 2014-09-18 Motorola Mobility Llc Method and Apparatus for Determining a Motion Environment Profile to Adapt Voice Recognition Processing
US20140278392A1 (en) * 2013-03-12 2014-09-18 Motorola Mobility Llc Method and Apparatus for Pre-Processing Audio Signals
US20140278415A1 (en) * 2013-03-12 2014-09-18 Motorola Mobility Llc Voice Recognition Configuration Selector and Method of Operation Therefor
US10306389B2 (en) 2013-03-13 2019-05-28 Kopin Corporation Head wearable acoustic system with noise canceling microphone geometry apparatuses and methods
US9312826B2 (en) * 2013-03-13 2016-04-12 Kopin Corporation Apparatuses and methods for acoustic channel auto-balancing during multi-channel signal extraction
US9977779B2 (en) 2013-03-14 2018-05-22 Apple Inc. Automatic supplementation of word correction dictionaries
US10642574B2 (en) 2013-03-14 2020-05-05 Apple Inc. Device, method, and graphical user interface for outputting captions
US9733821B2 (en) 2013-03-14 2017-08-15 Apple Inc. Voice control to diagnose inadvertent activation of accessibility features
US10652394B2 (en) 2013-03-14 2020-05-12 Apple Inc. System and method for processing voicemail
US10424292B1 (en) * 2013-03-14 2019-09-24 Amazon Technologies, Inc. System for recognizing and responding to environmental noises
US9368114B2 (en) 2013-03-14 2016-06-14 Apple Inc. Context-sensitive handling of interruptions
US10572476B2 (en) 2013-03-14 2020-02-25 Apple Inc. Refining a search based on schedule items
KR101857648B1 (en) 2013-03-15 2018-05-15 애플 인크. User training by intelligent digital assistant
AU2014251347B2 (en) 2013-03-15 2017-05-18 Apple Inc. Context-sensitive handling of interruptions
US10748529B1 (en) 2013-03-15 2020-08-18 Apple Inc. Voice activated device for use with a voice-based digital assistant
AU2014233517B2 (en) 2013-03-15 2017-05-25 Apple Inc. Training an at least partial voice command system
WO2014144579A1 (en) 2013-03-15 2014-09-18 Apple Inc. System and method for updating an adaptive speech recognition model
US9208781B2 (en) 2013-04-05 2015-12-08 International Business Machines Corporation Adapting speech recognition acoustic models with environmental and social cues
WO2014182453A2 (en) * 2013-05-06 2014-11-13 Motorola Mobility Llc Method and apparatus for training a voice recognition model database
US9953630B1 (en) * 2013-05-31 2018-04-24 Amazon Technologies, Inc. Language recognition for device settings
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
WO2014197336A1 (en) 2013-06-07 2014-12-11 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
WO2014197334A2 (en) 2013-06-07 2014-12-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
WO2014197335A1 (en) 2013-06-08 2014-12-11 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
EP3937002A1 (en) 2013-06-09 2022-01-12 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
AU2014278595B2 (en) 2013-06-13 2017-04-06 Apple Inc. System and method for emergency calls initiated by voice command
DE112014003653B4 (en) 2013-08-06 2024-04-18 Apple Inc. Automatically activate intelligent responses based on activities from remote devices
US10296160B2 (en) 2013-12-06 2019-05-21 Apple Inc. Method for extracting salient dialog usage from live data
US10534623B2 (en) * 2013-12-16 2020-01-14 Nuance Communications, Inc. Systems and methods for providing a virtual assistant
US9953634B1 (en) * 2013-12-17 2018-04-24 Knowles Electronics, Llc Passive training for automatic speech recognition
GB2524222B (en) * 2013-12-18 2018-07-18 Cirrus Logic Int Semiconductor Ltd Activating speech processing
US9589560B1 (en) * 2013-12-19 2017-03-07 Amazon Technologies, Inc. Estimating false rejection rate in a detection system
US9466310B2 (en) * 2013-12-20 2016-10-11 Lenovo Enterprise Solutions (Singapore) Pte. Ltd. Compensating for identifiable background content in a speech recognition device
JP6375521B2 (en) * 2014-03-28 2018-08-22 パナソニックIpマネジメント株式会社 Voice search device, voice search method, and display device
US10446168B2 (en) * 2014-04-02 2019-10-15 Plantronics, Inc. Noise level measurement with mobile devices, location services, and environmental response
KR102257910B1 (en) * 2014-05-02 2021-05-27 삼성전자주식회사 Apparatus and method for speech recognition, apparatus and method for generating noise-speech recognition model
US9620105B2 (en) 2014-05-15 2017-04-11 Apple Inc. Analyzing audio input for efficient speech and music recognition
US10592095B2 (en) 2014-05-23 2020-03-17 Apple Inc. Instantaneous speaking of content on touch devices
US9502031B2 (en) 2014-05-27 2016-11-22 Apple Inc. Method for supporting dynamic grammars in WFST-based ASR
US9760559B2 (en) 2014-05-30 2017-09-12 Apple Inc. Predictive text input
US10289433B2 (en) 2014-05-30 2019-05-14 Apple Inc. Domain specific language for encoding assistant dialog
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
US10078631B2 (en) 2014-05-30 2018-09-18 Apple Inc. Entropy-guided text prediction using combined word and character n-gram language models
US9734193B2 (en) 2014-05-30 2017-08-15 Apple Inc. Determining domain salience ranking from ambiguous words in natural speech
US9633004B2 (en) 2014-05-30 2017-04-25 Apple Inc. Better resolution when referencing to concepts
AU2015266863B2 (en) 2014-05-30 2018-03-15 Apple Inc. Multi-command single utterance input method
US9785630B2 (en) 2014-05-30 2017-10-10 Apple Inc. Text prediction using combined word N-gram and unigram language models
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US9904851B2 (en) 2014-06-11 2018-02-27 At&T Intellectual Property I, L.P. Exploiting visual information for enhancing audio signals via source separation and beamforming
US9858922B2 (en) 2014-06-23 2018-01-02 Google Inc. Caching speech recognition scores
US9639854B2 (en) 2014-06-26 2017-05-02 Nuance Communications, Inc. Voice-controlled information exchange platform, such as for providing information to supplement advertising
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US10659851B2 (en) 2014-06-30 2020-05-19 Apple Inc. Real-time digital assistant knowledge updates
US9837102B2 (en) * 2014-07-02 2017-12-05 Microsoft Technology Licensing, Llc User environment aware acoustic noise reduction
US10446141B2 (en) 2014-08-28 2019-10-15 Apple Inc. Automatic speech recognition based on user feedback
US9953646B2 (en) 2014-09-02 2018-04-24 Belleau Technologies Method and system for dynamic speech recognition and tracking of prewritten script
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US10789041B2 (en) 2014-09-12 2020-09-29 Apple Inc. Dynamic thresholds for always listening speech trigger
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US9299347B1 (en) 2014-10-22 2016-03-29 Google Inc. Speech recognition using associative mapping
US10999636B1 (en) * 2014-10-27 2021-05-04 Amazon Technologies, Inc. Voice-based content searching on a television based on receiving candidate search strings from a remote server
US9667321B2 (en) * 2014-10-31 2017-05-30 Pearson Education, Inc. Predictive recommendation engine
US10116563B1 (en) 2014-10-30 2018-10-30 Pearson Education, Inc. System and method for automatically updating data packet metadata
US10318499B2 (en) 2014-10-30 2019-06-11 Pearson Education, Inc. Content database generation
EP3213232A1 (en) 2014-10-30 2017-09-06 Pearson Education, Inc. Content database generation
US10110486B1 (en) 2014-10-30 2018-10-23 Pearson Education, Inc. Automatic determination of initial content difficulty
US10735402B1 (en) 2014-10-30 2020-08-04 Pearson Education, Inc. Systems and method for automated data packet selection and delivery
US10333857B1 (en) 2014-10-30 2019-06-25 Pearson Education, Inc. Systems and methods for data packet metadata stabilization
US10218630B2 (en) 2014-10-30 2019-02-26 Pearson Education, Inc. System and method for increasing data transmission rates through a content distribution network
JP2016109725A (en) * 2014-12-02 2016-06-20 ソニー株式会社 Information-processing apparatus, information-processing method, and program
US10552013B2 (en) 2014-12-02 2020-02-04 Apple Inc. Data detection
US9711141B2 (en) 2014-12-09 2017-07-18 Apple Inc. Disambiguating heteronyms in speech synthesis
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US10152299B2 (en) 2015-03-06 2018-12-11 Apple Inc. Reducing response latency of intelligent automated assistants
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
US10460227B2 (en) 2015-05-15 2019-10-29 Apple Inc. Virtual assistant in a communication session
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10504509B2 (en) 2015-05-27 2019-12-10 Google Llc Providing suggested voice-based action queries
US10200824B2 (en) 2015-05-27 2019-02-05 Apple Inc. Systems and methods for proactively identifying and surfacing relevant content on a touch-sensitive device
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US9578173B2 (en) 2015-06-05 2017-02-21 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US20160378747A1 (en) 2015-06-29 2016-12-29 Apple Inc. Virtual assistant for media playback
US9786270B2 (en) 2015-07-09 2017-10-10 Google Inc. Generating acoustic models
US10008199B2 (en) * 2015-08-22 2018-06-26 Toyota Motor Engineering & Manufacturing North America, Inc. Speech recognition system with abbreviated training
US10614368B2 (en) 2015-08-28 2020-04-07 Pearson Education, Inc. System and method for content provisioning with dual recommendation engines
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US10740384B2 (en) 2015-09-08 2020-08-11 Apple Inc. Intelligent automated assistant for media search and playback
US10331312B2 (en) 2015-09-08 2019-06-25 Apple Inc. Intelligent automated assistant in a media environment
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification
US11631421B2 (en) 2015-10-18 2023-04-18 Solos Technology Limited Apparatuses and methods for enhanced speech recognition in variable environments
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US10956666B2 (en) 2015-11-09 2021-03-23 Apple Inc. Unconventional virtual assistant interactions
US10468016B2 (en) 2015-11-24 2019-11-05 International Business Machines Corporation System and method for supporting automatic speech recognition of regional accents based on statistical information and user corrections
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10229672B1 (en) 2015-12-31 2019-03-12 Google Llc Training acoustic models using connectionist temporal classification
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US11138987B2 (en) * 2016-04-04 2021-10-05 Honeywell International Inc. System and method to distinguish sources in a multiple audio source environment
US11188841B2 (en) 2016-04-08 2021-11-30 Pearson Education, Inc. Personalized content distribution
US10789316B2 (en) 2016-04-08 2020-09-29 Pearson Education, Inc. Personalized automatic content aggregation generation
US10642848B2 (en) 2016-04-08 2020-05-05 Pearson Education, Inc. Personalized automatic content aggregation generation
US10325215B2 (en) 2016-04-08 2019-06-18 Pearson Education, Inc. System and method for automatic content aggregation generation
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US11227589B2 (en) 2016-06-06 2022-01-18 Apple Inc. Intelligent list reading
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
CN109313896B (en) * 2016-06-08 2020-06-30 谷歌有限责任公司 Extensible dynamic class language modeling method, system for generating an utterance transcription, computer-readable medium
DK179588B1 (en) 2016-06-09 2019-02-22 Apple Inc. Intelligent automated assistant in a home environment
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US10586535B2 (en) 2016-06-10 2020-03-10 Apple Inc. Intelligent digital assistant in a multi-tasking environment
DK179343B1 (en) 2016-06-11 2018-05-14 Apple Inc Intelligent task discovery
DK179415B1 (en) 2016-06-11 2018-06-14 Apple Inc Intelligent device arbitration and control
DK201670540A1 (en) 2016-06-11 2018-01-08 Apple Inc Application integration with a digital assistant
DK179049B1 (en) 2016-06-11 2017-09-18 Apple Inc Data driven natural language event detection and classification
US20180018973A1 (en) 2016-07-15 2018-01-18 Google Inc. Speaker verification
US10474753B2 (en) 2016-09-07 2019-11-12 Apple Inc. Language identification using recurrent neural networks
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
US10951720B2 (en) 2016-10-24 2021-03-16 Bank Of America Corporation Multi-channel cognitive resource platform
US11281993B2 (en) 2016-12-05 2022-03-22 Apple Inc. Model and ensemble compression for metric learning
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US11204787B2 (en) 2017-01-09 2021-12-21 Apple Inc. Application integration with a digital assistant
US10417266B2 (en) 2017-05-09 2019-09-17 Apple Inc. Context-aware ranking of intelligent response suggestions
DK201770383A1 (en) 2017-05-09 2018-12-14 Apple Inc. User interface for correcting recognition errors
DK180048B1 (en) 2017-05-11 2020-02-04 Apple Inc. MAINTAINING THE DATA PROTECTION OF PERSONAL INFORMATION
DK201770439A1 (en) 2017-05-11 2018-12-13 Apple Inc. Offline personal assistant
US10726832B2 (en) 2017-05-11 2020-07-28 Apple Inc. Maintaining privacy of personal information
US10395654B2 (en) 2017-05-11 2019-08-27 Apple Inc. Text normalization based on a data-driven learning network
US11301477B2 (en) 2017-05-12 2022-04-12 Apple Inc. Feedback analysis of a digital assistant
DK201770429A1 (en) 2017-05-12 2018-12-14 Apple Inc. Low-latency intelligent automated assistant
DK179745B1 (en) 2017-05-12 2019-05-01 Apple Inc. SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT
DK179496B1 (en) 2017-05-12 2019-01-15 Apple Inc. USER-SPECIFIC Acoustic Models
DK201770431A1 (en) 2017-05-15 2018-12-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
DK201770432A1 (en) 2017-05-15 2018-12-21 Apple Inc. Hierarchical belief states for digital assistants
US10403278B2 (en) 2017-05-16 2019-09-03 Apple Inc. Methods and systems for phonetic matching in digital assistant services
US10303715B2 (en) 2017-05-16 2019-05-28 Apple Inc. Intelligent automated assistant for media exploration
DK179560B1 (en) 2017-05-16 2019-02-18 Apple Inc. Far-field extension for digital assistant services
US20180336892A1 (en) 2017-05-16 2018-11-22 Apple Inc. Detecting a trigger of a digital assistant
US10311144B2 (en) 2017-05-16 2019-06-04 Apple Inc. Emoji word sense disambiguation
US10657328B2 (en) 2017-06-02 2020-05-19 Apple Inc. Multi-task recurrent neural network architecture for efficient morphology handling in neural language modeling
US10706840B2 (en) 2017-08-18 2020-07-07 Google Llc Encoder-decoder models for sequence to sequence mapping
US10096311B1 (en) 2017-09-12 2018-10-09 Plantronics, Inc. Intelligent soundscape adaptation utilizing mobile devices
US10445429B2 (en) 2017-09-21 2019-10-15 Apple Inc. Natural language understanding using vocabularies with compressed serialized tries
US10755051B2 (en) 2017-09-29 2020-08-25 Apple Inc. Rule-based natural language processing
CN107908742A (en) * 2017-11-15 2018-04-13 百度在线网络技术(北京)有限公司 Method and apparatus for output information
US10636424B2 (en) 2017-11-30 2020-04-28 Apple Inc. Multi-turn canned dialog
KR102446637B1 (en) * 2017-12-28 2022-09-23 삼성전자주식회사 Sound output system and speech processing method
US10733982B2 (en) 2018-01-08 2020-08-04 Apple Inc. Multi-directional dialog
CN108182270A (en) * 2018-01-17 2018-06-19 广东小天才科技有限公司 Search for content transmission and searching method, smart pen, search terminal and storage medium
KR102609430B1 (en) * 2018-01-23 2023-12-04 구글 엘엘씨 Selective adaptation and utilization of noise reduction technique in invocation phrase detection
US10733375B2 (en) 2018-01-31 2020-08-04 Apple Inc. Knowledge-based framework for improving natural language understanding
KR102585231B1 (en) * 2018-02-02 2023-10-05 삼성전자주식회사 Speech signal processing mehtod for speaker recognition and electric apparatus thereof
US10789959B2 (en) 2018-03-02 2020-09-29 Apple Inc. Training speaker recognition models for digital assistants
US10592604B2 (en) 2018-03-12 2020-03-17 Apple Inc. Inverse text normalization for automatic speech recognition
US10818288B2 (en) 2018-03-26 2020-10-27 Apple Inc. Natural assistant interaction
US10909331B2 (en) 2018-03-30 2021-02-02 Apple Inc. Implicit identification of translation payload with neural machine translation
US10923139B2 (en) * 2018-05-02 2021-02-16 Melo Inc. Systems and methods for processing meeting information obtained from multiple sources
US11145294B2 (en) 2018-05-07 2021-10-12 Apple Inc. Intelligent automated assistant for delivering content from user experiences
US10928918B2 (en) 2018-05-07 2021-02-23 Apple Inc. Raise to speak
US10984780B2 (en) 2018-05-21 2021-04-20 Apple Inc. Global semantic word embeddings using bi-directional recurrent neural networks
DK180639B1 (en) 2018-06-01 2021-11-04 Apple Inc DISABILITY OF ATTENTION-ATTENTIVE VIRTUAL ASSISTANT
DK201870355A1 (en) 2018-06-01 2019-12-16 Apple Inc. Virtual assistant operation in multi-device environments
US11386266B2 (en) 2018-06-01 2022-07-12 Apple Inc. Text correction
DK179822B1 (en) 2018-06-01 2019-07-12 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device
US10892996B2 (en) 2018-06-01 2021-01-12 Apple Inc. Variable latency device coordination
US10504518B1 (en) 2018-06-03 2019-12-10 Apple Inc. Accelerated task performance
CN109087659A (en) * 2018-08-03 2018-12-25 三星电子(中国)研发中心 Audio optimization method and apparatus
US11010561B2 (en) 2018-09-27 2021-05-18 Apple Inc. Sentiment prediction from textual data
US11170166B2 (en) 2018-09-28 2021-11-09 Apple Inc. Neural typographical error modeling via generative adversarial networks
US10839159B2 (en) 2018-09-28 2020-11-17 Apple Inc. Named entity normalization in a spoken dialog system
US11462215B2 (en) 2018-09-28 2022-10-04 Apple Inc. Multi-modal inputs for voice commands
US11475898B2 (en) 2018-10-26 2022-10-18 Apple Inc. Low-latency multi-speaker speech recognition
CN111415653B (en) * 2018-12-18 2023-08-01 百度在线网络技术(北京)有限公司 Method and device for recognizing speech
US11638059B2 (en) 2019-01-04 2023-04-25 Apple Inc. Content playback on multiple devices
CN109841227B (en) * 2019-03-11 2020-10-02 南京邮电大学 Background noise removing method based on learning compensation
US11348573B2 (en) 2019-03-18 2022-05-31 Apple Inc. Multimodality in digital assistant systems
DK201970509A1 (en) 2019-05-06 2021-01-15 Apple Inc Spoken notifications
US11423908B2 (en) 2019-05-06 2022-08-23 Apple Inc. Interpreting spoken requests
US11475884B2 (en) 2019-05-06 2022-10-18 Apple Inc. Reducing digital assistant latency when a language is incorrectly determined
US11307752B2 (en) 2019-05-06 2022-04-19 Apple Inc. User configurable task triggers
US11140099B2 (en) 2019-05-21 2021-10-05 Apple Inc. Providing message response suggestions
DK201970511A1 (en) 2019-05-31 2021-02-15 Apple Inc Voice identification in digital assistant systems
US11289073B2 (en) 2019-05-31 2022-03-29 Apple Inc. Device text to speech
DK180129B1 (en) 2019-05-31 2020-06-02 Apple Inc. User activity shortcut suggestions
US11496600B2 (en) 2019-05-31 2022-11-08 Apple Inc. Remote execution of machine-learned models
US11360641B2 (en) 2019-06-01 2022-06-14 Apple Inc. Increasing the relevance of new available information
US11227599B2 (en) 2019-06-01 2022-01-18 Apple Inc. Methods and user interfaces for voice-based control of electronic devices
US11848023B2 (en) * 2019-06-10 2023-12-19 Google Llc Audio noise reduction
CN112201247A (en) * 2019-07-08 2021-01-08 北京地平线机器人技术研发有限公司 Speech enhancement method and apparatus, electronic device, and storage medium
KR102260216B1 (en) * 2019-07-29 2021-06-03 엘지전자 주식회사 Intelligent voice recognizing method, voice recognizing apparatus, intelligent computing device and server
CN110648680A (en) * 2019-09-23 2020-01-03 腾讯科技(深圳)有限公司 Voice data processing method and device, electronic equipment and readable storage medium
WO2021056255A1 (en) 2019-09-25 2021-04-01 Apple Inc. Text detection using global geometry estimators
US11489794B2 (en) 2019-11-04 2022-11-01 Bank Of America Corporation System for configuration and intelligent transmission of electronic communications and integrated resource processing
CN110956955B (en) * 2019-12-10 2022-08-05 思必驰科技股份有限公司 Voice interaction method and device
CN112820307B (en) * 2020-02-19 2023-12-15 腾讯科技(深圳)有限公司 Voice message processing method, device, equipment and medium
CN111461438B (en) * 2020-04-01 2024-01-05 中国人民解放军空军93114部队 Signal detection method and device, electronic equipment and storage medium
US11061543B1 (en) 2020-05-11 2021-07-13 Apple Inc. Providing relevant data items based on context
US11038934B1 (en) 2020-05-11 2021-06-15 Apple Inc. Digital assistant hardware abstraction
US11755276B2 (en) 2020-05-12 2023-09-12 Apple Inc. Reducing description length based on confidence
US11490204B2 (en) 2020-07-20 2022-11-01 Apple Inc. Multi-device audio adjustment coordination
US11438683B2 (en) 2020-07-21 2022-09-06 Apple Inc. User identification using headphones
US11580959B2 (en) * 2020-09-28 2023-02-14 International Business Machines Corporation Improving speech recognition transcriptions
CN112652304B (en) * 2020-12-02 2022-02-01 北京百度网讯科技有限公司 Voice interaction method and device of intelligent equipment and electronic equipment
CN112669867B (en) * 2020-12-15 2023-04-11 阿波罗智联(北京)科技有限公司 Debugging method and device of noise elimination algorithm and electronic equipment
CN112634932B (en) * 2021-03-09 2021-06-22 赣州柏朗科技有限公司 Audio signal processing method and device, server and related equipment
CN113053382A (en) * 2021-03-30 2021-06-29 联想(北京)有限公司 Processing method and device
US11875798B2 (en) 2021-05-03 2024-01-16 International Business Machines Corporation Profiles for enhanced speech recognition training
CN114333881B (en) * 2022-03-09 2022-05-24 深圳市迪斯声学有限公司 Audio transmission noise reduction method, device and medium based on environment self-adaptation

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1453767A (en) * 2002-04-26 2003-11-05 日本先锋公司 Speech recognition apparatus and speech recognition method
US6718302B1 (en) * 1997-10-20 2004-04-06 Sony Corporation Method for utilizing validity constraints in a speech endpoint detector

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5970446A (en) * 1997-11-25 1999-10-19 At&T Corp Selective noise/channel/coding models and recognizers for automatic speech recognition
US7209880B1 (en) * 2001-03-20 2007-04-24 At&T Corp. Systems and methods for dynamic re-configurable speech recognition
JP3826032B2 (en) * 2001-12-28 2006-09-27 株式会社東芝 Speech recognition apparatus, speech recognition method, and speech recognition program
JP4357867B2 (en) * 2003-04-25 2009-11-04 パイオニア株式会社 Voice recognition apparatus, voice recognition method, voice recognition program, and recording medium recording the same
US7321852B2 (en) * 2003-10-28 2008-01-22 International Business Machines Corporation System and method for transcribing audio files of various languages
JP4340686B2 (en) * 2004-03-31 2009-10-07 パイオニア株式会社 Speech recognition apparatus and speech recognition method
DE102004017486A1 (en) * 2004-04-08 2005-10-27 Siemens Ag Method for noise reduction in a voice input signal
DE602007004733D1 (en) * 2007-10-10 2010-03-25 Harman Becker Automotive Sys speaker recognition
US20100145687A1 (en) 2008-12-04 2010-06-10 Microsoft Corporation Removing noise from speech

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6718302B1 (en) * 1997-10-20 2004-04-06 Sony Corporation Method for utilizing validity constraints in a speech endpoint detector
CN1453767A (en) * 2002-04-26 2003-11-05 日本先锋公司 Speech recognition apparatus and speech recognition method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105719645A (en) * 2014-12-17 2016-06-29 现代自动车株式会社 Speech recognition apparatus, vehicle including the same, and method of controlling the same
CN105719645B (en) * 2014-12-17 2020-09-18 现代自动车株式会社 Voice recognition apparatus, vehicle including the same, and method of controlling voice recognition apparatus

Also Published As

Publication number Publication date
AU2011267982A1 (en) 2012-11-01
US8249868B2 (en) 2012-08-21
US20120022860A1 (en) 2012-01-26
CN103069480A (en) 2013-04-24
AU2011267982B2 (en) 2015-02-05
EP2580751A1 (en) 2013-04-17
US8234111B2 (en) 2012-07-31
US20120259631A1 (en) 2012-10-11
US8666740B2 (en) 2014-03-04
US20110307253A1 (en) 2011-12-15
WO2011159628A1 (en) 2011-12-22
EP2580751B1 (en) 2014-08-13

Similar Documents

Publication Publication Date Title
CN103069480B (en) Speech and noise models for speech recognition
CN104575493B (en) Use the acoustic model adaptation of geography information
EP3923281B1 (en) Noise compensation using geotagged audio signals
AU2014200999B2 (en) Geotagged environmental audio for enhanced speech recognition accuracy

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: American California

Patentee after: Google limited liability company

Address before: American California

Patentee before: Google Inc.