CN110310631A

CN110310631A - Audio recognition method, device, server and storage medium

Info

Publication number: CN110310631A
Application number: CN201910578399.4A
Authority: CN
Inventors: 李扬
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2019-06-28
Filing date: 2019-06-28
Publication date: 2019-10-08

Abstract

The embodiment of the invention discloses a kind of audio recognition method, device, server and storage mediums.This method comprises: carrying out cartographic information search to present user speech, at least one matched candidate information is determined；According to the map domain features of active user, disambiguation processing is carried out at least one described candidate information, with the cartographic information recognition result of the determination present user speech.The embodiment of the present invention carries out disambiguation processing by the candidate information that the special search for map realm information obtains, interference of the general field knowledge that may be present for map search is not removed only, and erroneous judgement caused by avoiding ambiguity and accent etc., so that the cartographic information recognition result that search obtains is more in line with user's habit and demand, the speech recognition accuracy of map phonetic search greatly improved.

Description

Audio recognition method, device, server and storage medium

Technical field

The present embodiments relate to technical field of voice recognition more particularly to a kind of audio recognition method, device, servers And storage medium.

Background technique

Map phonetic search is the important function of current map, is inputted and is interacted by using voice, and substitution is manual It inputs to carry out the search inquiry of map category information, greatly user can be facilitated to input, be more suitable for map and drive scene.

Currently, the phonitic entry method interface of third party's maturation can be called, speech recognition function is provided for map phonetic search The support of energy.Wherein, used speech recognition modeling is typically based on what the training of large-scale internet data obtained, has language The versatility of sound identification.Alternatively, re -training one edition is exclusively used in map voice using the map corpus with map field experience The model of identification.

However, general speech recognition modeling lacks map field experience, it is not suitable for map phonetic search scene.As general After general speech recognition modeling uses in map scene, the title of each map category information has most of in common language map It is low frequency, rare, the not familiar vocabulary that is even not present in sound identification model, it is directly quasi- using general speech recognition modeling True rate can be excessively poor.Furthermore the higher cost based on the dedicated speech recognition modeling of map corpus re -training, and be difficult to overcome The problem of the identification mistake of cartographic information caused by noise, area differentiation, the accuracy rate of map phonetic search are lower.

Summary of the invention

The embodiment of the invention provides a kind of audio recognition method, device, server and storage mediums, can be improved map The speech recognition accuracy of phonetic search.

In a first aspect, the embodiment of the invention provides a kind of audio recognition methods, comprising:

Cartographic information search is carried out to present user speech, determines at least one matched candidate information；

According to the map domain features of active user, disambiguation processing is carried out at least one described candidate information, with determination The cartographic information recognition result of the present user speech.

Second aspect, the embodiment of the invention provides a kind of speech recognition equipments, comprising:

Candidate information determining module determines matched at least one for carrying out cartographic information search to present user speech A candidate information；

Speech recognition disambiguation module believes at least one described candidate for the map domain features according to active user Breath carries out disambiguation processing, with the cartographic information recognition result of the determination present user speech.

The third aspect, the embodiment of the invention provides a kind of servers, comprising:

One or more processors；

Memory, for storing one or more programs；

When one or more of programs are executed by one or more of processors, so that one or more of processing Device realizes audio recognition method described in any embodiment of that present invention.

Fourth aspect, the embodiment of the invention provides a kind of computer readable storage mediums, are stored thereon with computer journey Sequence realizes audio recognition method described in any embodiment of that present invention when the program is executed by processor.

For the embodiment of the present invention by carrying out special cartographic information search to user speech, determination is matched with user speech Multiple candidate informations are carried out disambiguation processing by multiple candidate informations, and the map domain features according to active user, from multiple times It selects and is filtered out in information with the most matched candidate information of user as cartographic information recognition result.The embodiment of the present invention by for The candidate information that the special search of map realm information obtains carries out disambiguation processing, does not remove only general field that may be present Interference of the knowledge for map search, and erroneous judgement caused by avoiding ambiguity and accent etc., so that the map that search obtains Information recognition result is more in line with user's habit and demand, and the speech recognition accuracy of map phonetic search greatly improved.

Detailed description of the invention

Fig. 1 is a kind of flow chart for audio recognition method that the embodiment of the present invention one provides；

Fig. 2 is a kind of flow chart of audio recognition method provided by Embodiment 2 of the present invention；

Fig. 3 is the integrated stand composition of speech recognition provided by Embodiment 2 of the present invention；

Fig. 4 is a kind of structural schematic diagram for speech recognition equipment that the embodiment of the present invention three provides；

Fig. 5 is a kind of structural schematic diagram for server that the embodiment of the present invention four provides.

Specific embodiment

The embodiment of the present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this Locate described specific embodiment and is used only for explaining the embodiment of the present invention, rather than limitation of the invention.It further needs exist for Bright, only parts related to embodiments of the present invention are shown for ease of description, in attached drawing rather than entire infrastructure.

It also should be noted that illustrate only part relevant to the application for ease of description, in attached drawing rather than Full content.It should be mentioned that some exemplary embodiments are described before exemplary embodiment is discussed in greater detail At the processing or method described as flow chart.Although operations (or step) are described as the processing of sequence by flow chart, It is that many of these operations can be implemented concurrently, concomitantly or simultaneously.In addition, the sequence of operations can be by again It arranges.The processing can be terminated when its operations are completed, it is also possible to have the additional step being not included in attached drawing. The processing can correspond to method, function, regulation, subroutine, subprogram etc..

Embodiment one

Fig. 1 is the flow chart of a kind of audio recognition method that the embodiment of the present invention one provides, the present embodiment be applicable to according to The case where carrying out cartographic information search according to user speech, this method can be executed by a kind of speech recognition equipment, which can be with It is realized by the way of software and/or hardware, is preferably arranged in server.This method specifically includes as follows:

S110, cartographic information search is carried out to present user speech, determines at least one matched candidate information.

In the specific embodiment of the invention, map products both provide cartographic information function of search, especially to facilitate The phonetic search function of driving navigation.When present user speech refers to user using map products, inputted to map products It may include the related letter at least one place to be searched for the phonetic search request of cartographic information inquiry, in user speech It ceases, for the restrictive condition in place to be searched etc..Map products request the phonetic search that user submits, and search for relevant POI (Point of Interest, point of interest) data simultaneously return to user, and each POI data in map may include name The much informations such as title, classification, longitude and latitude and different degree.Usual map search may include precise search and search for generally, In, precise search refers in map search, and the phonetic search request that user submits is looking into for some specific POI data point Look for, search for generally can be based on sug engine, according to phonetic search request in partial information, similarity or revised language The wide in range search that sound carries out.

In the present embodiment, user speech is identified using speech recognition modeling, speech recognition modeling usually can wrap Include acoustics submodel and language submodel.Wherein, in order to improve the accuracy that cartographic information is searched for, using general corpus and map Corpus trains speech recognition modeling simultaneously.Originally the cost that speech recognition modeling training is reduced in implementing, in traditional base On the basis of the universal phonetic identification model of general corpus training, there is major part to belong in view of the POI data in map low Frequently, rare not familiar vocabulary, thus can be directly used map corpus to the language submodel in universal phonetic identification model into Row second training, to reinforce speech recognition modeling for the recognition capability of map corpus.

In the present embodiment, it is different from traditional map search and user speech is identified as text, and carried out according to text The mode of search.In new departure, speech recognition does not export text directly, but is changed to export the phoneme being made of multiple phonemes It indicates to be used as fuzzy phoneme, to carry out cartographic information search based on fuzzy phoneme.Wherein, fuzzy phoneme can refer to non-critical requirement Phonetic, i.e. fuzzy phoneme are similar to phonetic in form, but there may be the difference for not meeting phonetic spelling rules in fuzzy phoneme, And fuzzy phoneme still remains the relevant feature of sound on the whole.To being scanned for based on fuzzy phoneme, obtain and fuzzy phoneme Pronounce at least one candidate information to match.Wherein, candidate information can be the candidate knowledge same or similar with user pronunciation Other text is also possible to candidate identification text consistent with user's accent or association extension.

Specifically, usually only identifying corresponding text information in speech recognition modeling, it is subnormal to be just considered as one Speech recognition process, and then can therefrom extract the average information of any link.Therefore, the present embodiment passes through to common language The second training of language submodel in sound identification model improves the identification hit rate of map category information, to reduce map category information Unrecognized abnormal conditions.Correspondingly, being identified and being obtained to user speech using the speech recognition mould of map corpus training When to identification text, i.e., after primary normal speech recognition, phoneme that available user speech is identified based on acoustics submodel It indicates, using the phonemic representation of user speech as fuzzy phoneme, and is not necessarily to or consideration text identification result few as far as possible.And then base It is scanned in fuzzy phoneme, during search, the error correction and extension to phonemic representation may be implemented, obtain multiple and different Phonemic representation searches for the time for obtaining matching respectively with former phonemic representation and variation phonemic representation as variation phonemic representation Select information.

Illustratively, it is assumed that identify that fuzzy phoneme output is " cangshangcun " based on user speech, be based on fuzzy phoneme Carry out cartographic information search, the candidate information that can be matched include " the Tongzhou District village Cang Shang ", " the Tongzhou District village Cang Shang " and " Tongzhou District Cang Changcun " etc..

S120, the map domain features according to active user, carry out disambiguation processing at least one candidate information, with determination The cartographic information recognition result of present user speech.

It can only be that foundation is scanned for user pronunciation in view of speech recognition, voice is searched in the specific embodiment of the invention There may be pronounce identical but content onrelevant search result with user speech in rope, it is also possible to which there are user's accent deviations to lead The incorrect search results of cause.Therefore the present embodiment is after at least one candidate information for obtaining matching with user speech, also It needs to carry out disambiguation processing to candidate information, to filter out the interference information in candidate information, improves cartographic information identification knot The accuracy that fruit determines.

In the present embodiment, disambiguation processing can be carried out to candidate information using the map domain features of active user.Specifically , the map domain features of active user include current map search scene characteristic, active user's behavioural characteristic, and candidate text At least one of in this cartographic information search quality feature.

Wherein, current map search scene characteristic may include the scene that user itself is locating when user currently scans for Feature, such as the position that user is current, and then can be according to the sky between position represented by user current location and candidate information Between positional relationship, such as space is subordinate to, space is adjacent and space far from etc., candidate information is screened.Illustratively, root According to the current GPS information of user determine user be presently in city, administrative region etc., the city, administrative area can be filtered out Candidate information other than domain.In addition, current map search scene characteristic can also include the spatial position description in user speech, Such as the spatial positions such as the region limitation of POI are described in user speech, and then can describe to filter out according to spatial position discontented The candidate information of sufficient restrictive condition.It illustratively, include restricted information " the POI point B in the city A ", candidate information in user speech In include " the POI point B in the city S ", and then filter out the candidate information.

Secondly, active user's behavioural characteristic may include the historical search behavior that user carries out map search, in view of user It is larger for the repeat search probability of same POI, therefore each candidate information can be determined according to the historical search behavior of user Repeat search probability, and then filter out the lower candidate information of repeat search probability.In addition, active user's behavioural characteristic can be with Accent feature including user.Generally according to areal variation, user there may be front and back nasal sound regardless of the case where, such as an with Ang, en and eng, in and ing, ian and iang and uan and uang etc., flat tongue and stick up tongue regardless of the case where, such as z and zh, c With ch and s and sh etc., or for the user of respective regions, user there may be f and h, r and l regardless of the case where etc..Therefore Map phonetic search in the present embodiment can determine the deviation feature of the pronunciation of user based on the historical search behavior of user Equal pronunciations habit retains and extends to have the phoneme that pronunciation is biased to user, to avoid only can be according to user The pronunciation of deviation leads to the situation of identification mistake.Illustratively, it is assumed that candidate information is screened by the way of marking, if It determines that active user is biased as f for the pronunciation of f and h, then when detecting the presence of the candidate information of f phoneme, f phoneme is extended For h phoneme, and set identical as the candidate information of f phoneme for the marking of the candidate information of h phoneme simultaneously, and then can with maximum Can include may correct candidate information.

Furthermore, it is possible to which map corpus or map category information are determined as map class information bank, map class information bank in advance In be stored with necessary being POI relevant information.Correspondingly, the cartographic information search quality feature of candidate text may include The similarity between map category information in candidate information and default map class information bank, i.e., by by candidate information with really deposit POI text similarity calculating, may further determine that whether candidate information belongs to the POI of a necessary being, keep away Exempt from interference of the vocabulary for map search of same or similar pronunciation in general corpus.Correspondingly, if candidate information and default ground The success of map class information matches or similarity in figure category information library are greater than certain threshold value, then illustrate that the candidate information is one A preferably map search result.And then filter out the second-rate candidate information of map search.In addition, the map letter of candidate text Ceasing search quality feature can also include historical search of the users for position represented by candidate information in map search Demand distribution.Such as user is lower for the map search demand in meagrely-populated area, and the map search of busy section of town is needed Ask higher.Therefore can be according to the historical search behavior of users, the search need of real-time or timing each POI of analysis, To filter out the lower candidate information of search need.

It is worth noting that, the present embodiment lists three kinds of map domain features, and every kind of map domain features packet respectively Include at least two concrete conditions.Wherein, the map domain features in the present embodiment are not limited to above-mentioned example, it is any can be to time Select information carry out rationally screening with the feature of disambiguation can using in the present embodiment, and in every kind of map domain features The division of situation is also not limited to above-mentioned example.

In the present embodiment, candidate information and active user can be determined according to one of map domain features or a variety of Correlation degree, according to candidate information and active user correlation degree, determine in candidate information between present user speech There are the ambiguity information of ambiguity, so that the ambiguity information in candidate information is filtered out, to obtain the cartographic information of present user speech Recognition result.

Illustratively, it is assumed that the cartographic information search quality according to candidate text disambiguates candidate information, and candidate It include " Wang Jiawei " and " Wang Jiawei " then passing through the detection of cartographic information search quality, can determining candidate information in information " Wang Jiawei " is the director names in general corpus, and candidate information " Wang Jiawei " is certain dining room title, belongs to physical presence Map POI, and then candidate information " Wang Jiawei " is ambiguity information, filters out to it, avoids the interference to recognition result.

The technical solution of the present embodiment, by carrying out special cartographic information search, determining and user's language to user speech Multiple candidate informations are carried out disambiguation processing by the matched multiple candidate informations of sound, and the map domain features according to active user, It is filtered out from multiple candidate informations with the most matched candidate information of user as cartographic information recognition result.The embodiment of the present invention Disambiguation processing is carried out by the candidate information that the special search for map realm information obtains, is not removed only that may be present Interference of the general field knowledge for map search, and erroneous judgement caused by avoiding ambiguity and accent etc., so that searching for To cartographic information recognition result be more in line with user's habit and demand, the speech recognition that map phonetic search greatly improved is quasi- True rate.

Embodiment two

The present embodiment on the basis of the above embodiment 1, provides a preferred embodiment of audio recognition method, Cartographic information can be carried out based on the phonemic representation of user speech to search for generally.Fig. 2 is one kind provided by Embodiment 2 of the present invention The flow chart of audio recognition method, as shown in Fig. 2, this method specifically include it is as follows:

S210, acoustic feature identification is carried out to present user speech, determines the phonemic representation of present user speech.

In the specific embodiment of the invention, user speech is identified using speech recognition modeling, speech recognition modeling It usually may include acoustics submodel and language submodel.Wherein, in order to improve the accuracy that cartographic information is searched for, use is general Corpus and map corpus train speech recognition modeling simultaneously.Mould is identified in traditional universal phonetic based on the training of general corpus On the basis of type, there is major part to belong to low frequency, rare not familiar vocabulary in view of the POI data in map, therefore can be direct Second training is carried out to the language submodel in universal phonetic identification model using map corpus, to reinforce speech recognition modeling pair In the recognition capability of map corpus.

In the present embodiment, user speech is identified to obtain identification text using the speech recognition modeling of map corpus training This, after being determined as normal speech recognition, available user speech is based on acoustics submodel progress acoustic feature and identifies to obtain Phonemic representation, using the phonemic representation of user speech as fuzzy phoneme.Wherein, phoneme is marked off according to the natural quality of voice The least speech unit come.From the point of view of acoustic properties, phoneme is the least speech unit come out from sound quality angular divisions；From physiology From the point of view of property, an articulation forms a phoneme, such as ma includes two articulations of m, a, is two phonemes.Identical hair The sound that sound movement issues is exactly same phoneme, and the sound that different articulations issue is exactly different phonemes, such as in ma-mi, two m Articulation is identical, is identical phoneme, and a, i articulation are different, are different phonemes.Phonetically will be by one or several The phonetic structure basic unit of phoneme composition is known as syllable.In Chinese, the word tone of a usual Chinese character is exactly a syllable, general The basic syllable of call is by one to multiple phonemes by centainly constituting in conjunction with rule.

It, can be using the phonemic representation identified by acoustics submodel progress acoustic feature as fuzzy in the present embodiment Sound, fuzzy phoneme can refer to that the phonetic of non-critical requirement, i.e. fuzzy phoneme are similar to phonetic in form, but may in fuzzy phoneme In the presence of the difference for not meeting phonetic spelling rules, and fuzzy phoneme still remains the relevant feature of sound on the whole.It is exemplary , it is assumed that user speech is " village Cang Shang ", then based on the identification to user's actual speech, available phonemic representation is " cangshangcun ", " canshangcun ", " changshangcun " or " cagshangcun " etc..

S220, cartographic information search, at least one matched time of determining and phonemic representation pronunciation are carried out according to phonemic representation Select information.

Enumerating as far as possible in the specific embodiment of the invention, in phonemic representation correctly to pronounce to indicate, thus base Carry out cartographic information search in phonemic representation, it is available it is same or similar with user pronunciation it is candidate identify text, can also be with It is consistent with user's accent or the candidate identification text of association extension.

Optionally, error correction is carried out to phonemic representation and phoneme extends, determine at least one variation phonemic representation；Acquisition and sound Element indicates and at least one at least one matched candidate information of variation phonemic representation pronunciation.

It, can be with phonemic representation that acoustics submodel identifies for basic phonemic representation, in basic announcement in the present embodiment On the basis of element indicates, error correction and extension are carried out to basic phonemic representation, obtain at least one different from basic phonemic representation Make a variation phonemic representation.Wherein, to the error correction of basic phonemic representation refer to in basic phonemic representation exist do not meet Pinyin rule Phoneme be modified.For example, in the above-described embodiments, variation sound can be modified to for phonemic representation " cagshangcun " Element indicates " cangshangcun ".To in basic phonemic representation, there may be pronunciation deviations is referred to the extension of basic phonemic representation Phoneme be associated extension, i.e., by there may be a variety of possible deviations of the phoneme of pronunciation deviation as a result, all associations obtain not Same variation phonemic representation, avoids user's accent problem from leading to speech recognition root mistake.By make a variation phonemic representation acquisition, Will likely can correctly pronounce as much as possible indicates all to include into, to improve the accuracy rate of map phonetic search.To Be based respectively on phonemic representation and variation phonemic representation carry out map search, matching obtain and phonemic representation and at least one change At least one matched candidate information of different phonemic representation pronunciation.

Illustratively, phoneme fuzzy matching table can be preset, defines matching rule in the phoneme fuzzy matching table, Such as: z=zh, c=ch, s=sh, an=ang, en=eng, in=ing, ian=iang, uan=uang, iong=ing, F=h, r=l and l=n etc..For example, obtain its basic phoneme when user speech is " building recklessly " and be expressed as " hujian ", into And the extension based on phoneme, at least one available variation phonemic representation is " fujian ".By the extension to phonemic representation, Solves the problems such as identification of the speech recognition as caused by speaking with a lisp user, cacoepy is true or identification mistake, further Improve the accuracy rate of cartographic information search in the present embodiment.

Usual map search may include precise search and search for generally, wherein precise search, which refers in map search, to be used The phonetic search request that family is submitted is the lookup for some specific POI data point, is searched for generally can be based on sug engine, According to phonetic search request in the progress such as partial information, similarity wide in range search.It is worth noting that, phonemic representation is entangled Wrong and expansion process can be independently of the independent treatment process except cartographic information search, is also possible to integrated and map and believes Cease the preprocessing process within function of search.

S230, foundation map domain features, determine at least one candidate information and the correlation degree of active user.

In the specific embodiment of the invention, the map domain features of active user include current map search scene characteristic, At least one of in active user's behavioural characteristic, and the cartographic information search quality feature of candidate text.

Optionally, current map search scene characteristic be determined as follows: according to the current location of active user with Spatial relation between position represented by candidate information determines that current map searches for scene characteristic；And/or it will currently use The voice of family of spatial position in to(for) position represented by candidate information describes, and searches for scene characteristic as current map.

In the present embodiment, in cartographic information search, candidate information usually indicates specific map POI, and then candidate Information can be with the specific location or position range of secondary indication POI.It is often when carrying out cartographic information search in view of user Some destination is gone to or plans to go to, and then can be centered on the current position of user, to external radiation lookup and phoneme Indicate matched POI.Such as POI preferentially is recalled as candidate information from this city.Therefore the present embodiment can be according to active user Current location and candidate information represented by spatial relation between position, such as space is subordinate to, space is adjacent and empty Between far from etc., determine current map search for scene characteristic, to be screened to candidate information.Illustratively, user current location It is under the jurisdiction of city A, position represented by candidate information 1 is under the jurisdiction of city A, and position represented by candidate information 2 is under the jurisdiction of city B, then It can determine that the spatial relation between the current location of active user and position represented by candidate information 1 is that space is adjacent, Determine active user current location and candidate information 2 represented by spatial relation between position be that space is separate.

In addition, may be described comprising the spatial position for POI to be searched in user speech in cartographic information search, The relationship between POI to be searched and other at least one positions or position range is contained in the description of spatial position.Therefore this reality The spatial position in present user speech for position represented by candidate information can be described by applying example, be searched for as current map Scene characteristic, to be screened to candidate information.Illustratively, it is assumed that user speech is " the dining room S of city A ", then by dining room S The spatial position for being under the jurisdiction of city A describes this restrictive condition as current map and searches for scene characteristic.

Optionally, active user's behavioural characteristic is determined as follows: determining active user for candidate information institute table Show the historical search behavior of position and the pronunciation habit of active user；According to historical search behavior and/or the pronunciation of active user Habit, determines active user's behavioural characteristic.

In the present embodiment, for currently carrying out the user of cartographic information search, the history of the available user is searched Suo Hangwei, and determine historical search behavior of the active user for position represented by candidate information, by active user for candidate The historical search behavior of position represented by information is as active user's behavioural characteristic.Such as available active user is for candidate The historical search time of position represented by information and historical search number.Furthermore, it is possible to the historical search behavior based on user, really Determine the pronunciations such as the deviation feature of user pronunciation habit, regard the pronunciation habit of active user as active user's behavioural characteristic.Such as Determine that active user is biased as h for the pronunciation of f and h.

Optionally, the cartographic information search quality feature of candidate text is determined as follows: determine candidate information with Similarity between map category information and map search class user in default map class information bank is for represented by candidate information The historical search demand of position is distributed；It is distributed according to similarity and/or historical search demand, determines the cartographic information of candidate's text Search quality feature.

In the present embodiment, map corpus or map category information can be determined as map class information bank, map class in advance The relevant information of the POI of necessary being is stored in information bank.Wherein, if candidate information and the ground in default map class information bank Figure category information matches, then can determine that the candidate information is the POI of physical presence, rather than it is similar dry to pronounce in general corpus Disturb word.Therefore the present embodiment can will be similar between candidate information and the map category information in default map class information bank Degree, the cartographic information search quality feature as candidate text.Illustratively, similarity threshold can be preset, if candidate Similarity between map category information in information and default map class information bank meets default similarity threshold, then can determine The cartographic information search quality of the candidate information is higher.

In addition, the historical search behavior of map search class user, it can be from users be macroscopically reflected for map The search trend of information, thus can in real time or timing acquisition map search class user historical search behavior, determine map Searching class user is distributed the historical search demand of position represented by candidate information, and map search class user believes candidate Cease the historical search demand distribution of represented position, the cartographic information search quality feature as candidate text.Illustratively, ground Figure class user is higher for the search need of POI in the central city of city A, and the search for the city periphery A development zone Demand is lower.In another example map class user nets the search need of red POI substantially to Mr. Yu at no distant date with the development of small video Degree increases.

In the present embodiment, according in map domain features at least one of, determine each candidate information and active user Correlation degree.Illustratively, scene characteristic is searched for for current map, space membership can be based on, determine that user is current Region or city subjected, for, with the candidate information of identical membership, determining the time between the region or city Selecting the correlation degree of information and user is the larger value.For example, the current location of active user is under the jurisdiction of city A, then will be under the jurisdiction of The candidate information of city A and the correlation degree of user are set as the larger value.Further, it is also possible to for meeting space in user speech The candidate information of location expression determines that the correlation degree between the candidate information and active user is the larger value, and conversely, for It is unsatisfactory for the candidate information of spatial position description, determines that the correlation degree between the candidate information and active user is smaller value, Even zero.

It illustratively, is principle based on user's repeat search probability for active user's behavioural characteristic, it can foundation In the historical search behavior of active user, historical search time and historical search quantity for position represented by candidate information, The candidate information higher for historical search quantity in certain historical search time, determines that the correlation degree of itself and active user are got over Greatly.Further, it is also possible to which the pronunciation in conjunction with active user is accustomed to, however, it is determined that active user has the pronunciation of at least two phonemes Obscure, then can determine that the correlation degree between the corresponding candidate information of at least two phonemes and active user is identical.

For another example for the cartographic information search quality of candidate text, if according to candidate information and default map category information Similarity in library between map category information determines that the candidate information is the map POI of physical presence, it is determined that the candidate information Map search quality it is higher, it is the larger value that correlation degree between the candidate information and active user, which can be set,.In addition, also It can be distributed according to historical search demand, if the historical search demand of position represented by candidate information is higher, this can be set Correlation degree between candidate information and active user is bigger.

In the present embodiment, can in aggregate map domain features based on various features determine candidate information and active user Between correlation degree, the comprehensive correlation degree of determination that obtains candidate information based on each side's region feature.Wherein it is possible to based on big Data and machine learning model integrate correlation degree of the candidate information under various features, obtain candidate information and active user Between correlation degree；Correlation degree can also be pre- by the various features in map domain features by the way of marking If weight, marking result and weight based on various features are weighted summation, obtain between candidate information and active user Correlation degree.

S240, according to the correlation degree of at least one candidate information and active user, it is determining between present user speech There are the ambiguity information of ambiguity.

It, can be according to the correlation degree of candidate information and active user, to candidate information in the specific embodiment of the invention It is ranked up, determines that the lower candidate information of correlation degree is the ambiguity information for interfering search result in map search.Wherein, may be used To preset correlation degree threshold value or percentage threshold, it will be less than correlation degree threshold value or correlation degree be lower pre- If the candidate information of percentage threshold quantity is determined as ambiguity information.

S250, ambiguity information is filtered out from least one candidate information, to determine that the cartographic information of present user speech is known Other result.

In the specific embodiment of the invention, ambiguity information is filtered out from candidate information, the candidate letter after can filtering out Breath is used as cartographic information recognition result, user is showed according to the sequence of correlation degree from high to low, so that user preferentially sees With the highest cartographic information recognition result of my correlation degree, can also believe with reference to the relatively low candidate of correlation degree is obtained Breath.Or can also be directly using the highest candidate information of correlation degree as cartographic information recognition result, and show user.

Illustratively, Fig. 3 is the integrated stand composition of speech recognition in the present embodiment.As shown in figure 3, user is to map search Client inputs phonetic search request, and speech recognition modeling carries out speech recognition according to the user speech received, determines phoneme It indicates to be used as fuzzy phoneme.Wherein, speech recognition modeling is in the universal phonetic identification model obtained based on the training of general corpus On the basis of, the speech recognition modeling obtained after second training is carried out using map corpus, and then avoid speech recognition modeling height The re -training of cost, not only maintains the knowledge of general field, but also improves speech recognition modeling for cartographic information Recognition accuracy.It include the error correction and extension of phonemic representation in search, to search secondly, carrying out map search using fuzzy phoneme Rope obtains multiple candidate informations.Wherein, the powerful matching capacity based on map search, substitute simple phonetic to text Match, has directly filtered out the interference of general field knowledge that may be present.The final map domain features using active user are to time Select information to carry out disambiguation processing, with obtain with active user's correlation degree strongest top n identification text, will identification text or The highest identification text of its correlation degree is selected as perfect copy and feeds back to user.

The technical solution of the present embodiment, by the speech recognition modeling optimized based on map corpus, to present user speech Acoustic feature identification is carried out, determines the phonemic representation of present user speech, the error correction and expansion of phonemic representation are carried out as fuzzy phoneme Exhibition, and cartographic information search is carried out according to fuzzy phoneme, at least one possibility candidate information is obtained, it is final according to active user's Map domain features determine candidate information and the correlation degree of active user, carry out disambiguation processing to multiple candidate informations, from more It is filtered out in a candidate information with the most matched candidate information of user as cartographic information recognition result.The embodiment of the present invention passes through The identification of fuzzy phoneme not only remains the relevant feature of user speech sound, but also has evaded the problem of text selection mistake, leads to It crosses the optimization of speech recognition modeling and the map search of fuzzy phoneme is eliminated instead of the matching of simple phonetic to text Interference of the general field knowledge that may be present for map search, avoids erroneous judgement caused by ambiguity and accent etc., substantially Improve the speech recognition accuracy of map phonetic search.

Embodiment three

Fig. 4 is a kind of structural schematic diagram for speech recognition equipment that the embodiment of the present invention three provides, and the present embodiment is applicable According to user speech carry out cartographic information search the case where, which is configured in server, it can be achieved that the present invention is arbitrarily real Apply audio recognition method described in example.The device specifically includes as follows:

Candidate information determining module 410, for present user speech carry out cartographic information search, determine it is matched at least One candidate information；

Speech recognition disambiguation module 420, for the map domain features according to active user, at least one described candidate Information carries out disambiguation processing, with the cartographic information recognition result of the determination present user speech.

Optionally, the speech recognition disambiguation module 420 is specifically used for:

According to the map domain features, at least one determining described candidate information is associated with journey with the active user's Degree；

According to the correlation degree of described at least one candidate information and the active user, determining and active user's language There are the ambiguity information of ambiguity between sound；

The ambiguity information is filtered out, from least one described candidate information with the map of the determination present user speech Information recognition result.

Optionally, the map domain features of the active user include current map search scene characteristic, active user's row At least one of it is characterized, and in the cartographic information search quality feature of candidate text.

Optionally, the current map search scene characteristic is determined as follows:

According to the spatial relation between position represented by the current location of the active user and candidate information, determine The current map searches for scene characteristic；And/or

The present user speech of spatial position in to(for) position represented by candidate information is described, as current map Search for scene characteristic.

Optionally, active user's behavioural characteristic is determined as follows:

Determine historical search behavior and the active user of the active user for position represented by candidate information Pronunciation habit；

It is accustomed to according to the historical search behavior and/or the pronunciation of the active user, determines active user's behavior Feature.

Optionally, the cartographic information search quality feature of the candidate text is determined as follows:

Determine the similarity and map search class between the map category information in candidate information and default map class information bank User is distributed the historical search demand of position represented by candidate information；

It is distributed according to the similarity and/or the historical search demand, determines that the cartographic information of the candidate text is searched Rope qualitative character.

Optionally, the candidate information determining module 410 includes:

Phoneme recognition unit 4101 determines described current for carrying out acoustic feature identification to the present user speech The phonemic representation of user speech；

Map search unit 4102, for carrying out cartographic information search, the determining and phoneme according to the phonemic representation Indicate at least one matched candidate information.

Optionally, the map search unit 4102 is specifically used for:

Error correction and phoneme extension are carried out to the phonemic representation, determine at least one variation phonemic representation；

It obtains and the phonemic representation and at least one matched candidate of at least one described variation phonemic representation pronunciation Information.

The technical solution of the present embodiment realizes phonemic representation (i.e. mould by the mutual cooperation between each functional module Paste sound) identification, fuzzy phoneme amendment and extension, based on fuzzy phoneme cartographic information search, map domain features determination, wait It selects the disambiguation of information and accurately identifies the functions such as the feedback of text.The embodiment of the present invention passes through for the special of map realm information The candidate information that item search obtains carries out disambiguation processing, does not remove only general field knowledge that may be present for map search Interference, and erroneous judgement caused by avoiding ambiguity and accent etc., so that the cartographic information recognition result that search obtains is more Meet user's habit and demand, the speech recognition accuracy of map phonetic search greatly improved.

Example IV

Fig. 5 is a kind of structural schematic diagram for server that the embodiment of the present invention four provides, and Fig. 5, which is shown, to be suitable for being used to realizing The block diagram of the exemplary servers of embodiment of the embodiment of the present invention.The server that Fig. 5 is shown is only an example, should not be right The function and use scope of the embodiment of the present invention bring any restrictions.

The server 12 that Fig. 5 is shown is only an example, should not function and use scope band to the embodiment of the present invention Carry out any restrictions.

As shown in figure 5, server 12 is showed in the form of universal computing device.The component of server 12 may include but not Be limited to: one or more processor 16, system storage 28 connect different system components (including system storage 28 and place Manage device 16) bus 18.

Bus 18 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It lifts For example, these architectures include but is not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC) Bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI) bus.

Server 12 typically comprises a variety of computer system readable media.These media can be and any can be serviced The usable medium that device 12 accesses, including volatile and non-volatile media, moveable and immovable medium.

System storage 28 may include the computer system readable media of form of volatile memory, such as arbitrary access Memory (RAM) 30 and/or cache memory 32.Server 12 may further include other removable/nonremovable , volatile/non-volatile computer system storage medium.Only as an example, storage system 34 can be used for reading and writing not removable Dynamic, non-volatile magnetic media (Fig. 5 do not show, commonly referred to as " hard disk drive ").Although being not shown in Fig. 5, can provide Disc driver for being read and write to removable non-volatile magnetic disk (such as " floppy disk "), and to removable anonvolatile optical disk The CD drive of (such as CD-ROM, DVD-ROM or other optical mediums) read-write.In these cases, each driver can To be connected by one or more data media interfaces with bus 18.System storage 28 may include that at least one program produces Product, the program product have one group of (for example, at least one) program module, these program modules are configured to perform of the invention real Apply the function of each embodiment of example.

Program/utility 40 with one group of (at least one) program module 42 can store and store in such as system In device 28, such program module 42 includes but is not limited to operating system, one or more application program, other program modules And program data, it may include the realization of network environment in each of these examples or certain combination.Program module 42 Usually execute the function and/or method in described embodiment of the embodiment of the present invention.

Server 12 can also be logical with one or more external equipments 14 (such as keyboard, sensing equipment, display 24 etc.) Letter, can also be enabled a user to one or more equipment interact with the server 12 communicate, and/or with make the server The 12 any equipment (such as network interface card, modem etc.) that can be communicated with one or more of the other calculating equipment communicate. This communication can be carried out by input/output (I/O) interface 22.Also, server 12 can also pass through network adapter 20 With one or more network (such as local area network (LAN), wide area network (WAN) and/or public network, such as internet) communication. As shown, network adapter 20 is communicated by bus 18 with other modules of server 12.It should be understood that although not showing in figure Out, can in conjunction with server 12 use other hardware and/or software module, including but not limited to: microcode, device driver, Redundant processor, external disk drive array, RAID system, tape drive and data backup storage system etc..

The program that processor 16 is stored in system storage 28 by operation, thereby executing various function application and number According to processing, such as realize audio recognition method provided by the embodiment of the present invention.

Embodiment five

The embodiment of the present invention five also provides a kind of computer readable storage medium, be stored thereon with computer program (or For computer executable instructions), for executing a kind of audio recognition method when which is executed by processor, this method comprises:

The computer storage medium of the embodiment of the present invention, can be using any of one or more computer-readable media Combination.Computer-readable medium can be computer-readable signal media or computer readable storage medium.It is computer-readable Storage medium for example may be-but not limited to-the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device or Device, or any above combination.The more specific example (non exhaustive list) of computer readable storage medium includes: tool There are electrical connection, the portable computer diskette, hard disk, random access memory (RAM), read-only memory of one or more conducting wires (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD- ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer-readable storage Medium can be any tangible medium for including or store program, which can be commanded execution system, device or device Using or it is in connection.

Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for By the use of instruction execution system, device or device or program in connection.

The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited In wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.

Can with one or more programming languages or combinations thereof come write for execute the embodiment of the present invention operation Computer program code, described program design language include object oriented program language-such as Java, Smalltalk, C++, further include conventional procedural programming language-such as " C " language or similar program design language Speech.Program code can be executed fully on the user computer, partly be executed on the user computer, as an independence Software package execute, part on the user computer part execute on the remote computer or completely in remote computer or It is executed on server.In situations involving remote computers, remote computer can pass through the network of any kind --- packet It includes local area network (LAN) or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as benefit It is connected with ISP by internet).

Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation, It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being implemented by above embodiments to the present invention Example is described in further detail, but the embodiment of the present invention is not limited only to above embodiments, is not departing from structure of the present invention It can also include more other equivalent embodiments in the case where think of, and the scope of the present invention is determined by scope of the appended claims It is fixed.

Claims

1. a kind of audio recognition method characterized by comprising

According to the map domain features of active user, disambiguation processing is carried out at least one described candidate information, described in determination The cartographic information recognition result of present user speech.

2. the method according to claim 1, wherein the map domain features according to active user, to institute It states at least one candidate information and carries out disambiguation processing, with the cartographic information recognition result of the determination present user speech, comprising:

According to the map domain features, the correlation degree of described at least one candidate information and the active user is determined；

According to the correlation degree of described at least one candidate information and the active user, the determining and present user speech it Between there are the ambiguity information of ambiguity；

The ambiguity information is filtered out, from least one described candidate information with the cartographic information of the determination present user speech Recognition result.

3. according to the method described in claim 2, it is characterized in that, the map domain features of the active user include current position Graph search scene characteristic, at least one in active user's behavioural characteristic, and the cartographic information search quality feature of candidate text ?.

4. according to the method described in claim 3, it is characterized in that, the current map searches for scene characteristic in the following way It determines:

According to the spatial relation between position represented by the current location of the active user and candidate information, determine described in Current map searches for scene characteristic；And/or

The present user speech of spatial position in to(for) position represented by candidate information is described, is searched for as current map Scene characteristic.

5. according to the method described in claim 3, it is characterized in that, active user's behavioural characteristic is true in the following way It is fixed:

Determine the active user for the historical search behavior of position represented by candidate information and the pronunciation of the active user Habit；

It is accustomed to according to the historical search behavior and/or the pronunciation of the active user, determines active user's behavioural characteristic.

6. according to the method described in claim 3, it is characterized in that, the cartographic information search quality feature of candidate's text is logical Under type such as is crossed to determine:

Determine the similarity between the map category information in candidate information and default map class information bank and map search class user Historical search demand distribution for position represented by candidate information；

It is distributed according to the similarity and/or the historical search demand, determines the cartographic information search matter of the candidate text Measure feature.

7. the method according to claim 1, wherein it is described to present user speech carry out cartographic information search, Determine at least one matched candidate information, comprising:

Acoustic feature identification is carried out to the present user speech, determines the phonemic representation of the present user speech；

Cartographic information search, at least one matched candidate of determining and phonemic representation pronunciation are carried out according to the phonemic representation Information.

8. the method according to the description of claim 7 is characterized in that described search according to phonemic representation progress cartographic information Rope, at least one matched candidate information of determining and phonemic representation pronunciation, comprising:

It obtains and believes with the phonemic representation and at least one matched candidate of at least one described variation phonemic representation pronunciation Breath.

9. a kind of speech recognition equipment characterized by comprising

Candidate information determining module determines at least one matched time for carrying out cartographic information search to present user speech Select information；

Speech recognition disambiguation module, for according to active user map domain features, at least one described candidate information into Row disambiguation processing, with the cartographic information recognition result of the determination present user speech.

10. a kind of server characterized by comprising

One or more processors；

Memory, for storing one or more programs；

When one or more of programs are executed by one or more of processors, so that one or more of processors are real Now such as audio recognition method of any of claims 1-8.

11. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor Such as audio recognition method of any of claims 1-8 is realized when execution.