CN107016994A

CN107016994A - The method and device of speech recognition

Info

Publication number: CN107016994A
Application number: CN201610057651.3A
Authority: CN
Inventors: 李宏言
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2016-01-27
Filing date: 2016-01-27
Publication date: 2017-08-04
Anticipated expiration: 2036-01-27
Also published as: CN107016994B

Abstract

This application provides a kind of method and device of speech recognition.Wherein, this method includes：Using the speech recognition based on Chinese character, speech recognition is carried out to name entity voice to be identified, the Chinese character sequence to identify the Chinese Character Recognition result as the name entity voice to be identified；Using the speech recognition based on phonetic, speech recognition is carried out to the name entity voice to be identified, the pinyin sequence to identify the phonetic recognition result as the name entity voice to be identified；According to the Chinese character sequence identified and the pinyin sequence, the similarity of each candidate name entity and the name entity voice to be identified in specific name list of entities is determined；The similarity of entity and the name entity voice to be identified is named according to each described candidate, the voice identification result of the name entity voice to be identified is determined from the specific name list of entities.The application improves the accuracy of the identification to naming entity voice.

Description

The method and device of speech recognition

Technical field

The application is related to field of speech recognition, more particularly to a kind of method and device of speech recognition.

Background technology

Existing speech recognition technology is typically known using the voice being made up of language model and acoustic model Voice is identified other network.Wherein, acoustic model is by the way that training speech database is utilized into instruction Practice algorithm produce after model training, join the feature of voice to be identified when carrying out speech recognition Number is matched to be identified result with acoustic model.Language model is by training text data Storehouse carries out grammer, semantic analysis, is produced by being trained based on statistical model, language model can be tied Close the internal relation between the knowledge of syntax and semantics, descriptor.

Name entity (Named Entity, NE) refers to some specific names with substantive significance, common to have Name, place name, mechanism name, song title etc., it is possibility to have time, date, numeral classifier phrase etc..Existing It is relatively low to the recognition accuracy for naming entity in some speech recognition systems, usually needed in some scenes pair Name entity is further recognized, for example, song title, name of contact person etc..This is due to life The length of name entity is typically shorter (for example, song title " silent "), and therefore, it is difficult to be effectively combined language Speech model and acoustic model are identified, and cause the accuracy rate of identification relatively low.Also, much name entity Between have certain confusion, for example, " Henan " and " Holland " voice is close, if not combining context Be difficult to be recognized accurately is which；Also some name entities do not meet language regulation, for example, using Network popular word is as song title, for example, song title " how abandoning treatment ".Two kinds of above-mentioned situations more increase The difficulty that speech recognition is carried out to certain types of name entity is added.

The content of the invention

The purpose of the application is to improve the accuracy of the identification to naming entity voice.

According to one embodiment of the application there is provided a kind of method of speech recognition, a kind of voice is known Method for distinguishing, this method comprises the following steps：

Using the speech recognition based on Chinese character, speech recognition is carried out to name entity voice to be identified, to know Do not go out the Chinese character sequence of the Chinese Character Recognition result as the name entity voice to be identified；

Using the speech recognition based on phonetic, speech recognition is carried out to the name entity voice to be identified, Pinyin sequence to identify the phonetic recognition result as the name entity voice to be identified；

According to the Chinese character sequence identified and the pinyin sequence, determine in specific name list of entities Each candidate name entity with it is described it is to be identified name entity voice similarity；

The similarity of entity and the name entity voice to be identified is named according to each described candidate, from institute State the voice identification result that the name entity voice to be identified is determined in specific name list of entities.

According to one embodiment of the application, a kind of name voice search method, including：

Voice command to be identified is matched with the voice command template prestored, should so as to obtain out Name voice to be identified in voice command to be identified；

Using the speech recognition based on Chinese character, speech recognition is carried out to name voice to be identified, to identify It is used as the Chinese character sequence of the Chinese Character Recognition result of the name voice to be identified；

Using the speech recognition based on phonetic, speech recognition is carried out to the name voice to be identified, to know Do not go out the pinyin sequence of the phonetic recognition result as the name voice to be identified；

According to the Chinese character sequence identified and the pinyin sequence, determine each in particular person list of file names The similarity of individual candidate's name and the name voice to be identified；

According to the similarity of each described candidate's name and the name voice to be identified, from the particular person The voice identification result of the name voice to be identified is determined in list of file names.

According to one embodiment of the application there is provided a kind of song voice search method, including：

Voice command to be identified is matched with the voice command template prestored, should so as to obtain out Title of the song voice to be identified in voice command to be identified；

Using the speech recognition based on Chinese character, speech recognition is carried out to title of the song voice to be identified, to identify It is used as the Chinese character sequence of the Chinese Character Recognition result of the title of the song voice to be identified；

Using the speech recognition based on phonetic, speech recognition is carried out to the title of the song voice to be identified, to know Do not go out the pinyin sequence of the phonetic recognition result as the title of the song voice to be identified；

According to the Chinese character sequence identified and the pinyin sequence, determine each in specific title of the song list Individual candidate's title of the song and the similarity of the title of the song voice to be identified；

According to each described candidate's title of the song and the similarity of the title of the song voice to be identified, from the specific song The voice identification result of the title of the song voice to be identified is determined in list of file names.

According to one embodiment of the application there is provided a kind of method that communication connection is set up by voice, Including：

According to the Chinese character sequence identified and the pinyin sequence, each in user communication record is determined The similarity of name and the name voice to be identified；

It is logical from the user according to the similarity of each described candidate's name and the name voice to be identified The voice identification result of the name voice to be identified is determined in news record；

User into the user communication record as voice identification result of determination initiates communication connection.

Compared with prior art, embodiments herein has advantages below：

The embodiment of the present application obtains the Chinese in the speech recognition for carrying out routine to name entity voice to be identified On the basis of the recognition result of font formula, phonetic identification is also carried out, the recognition result of PINYIN form is obtained, And according to the Chinese Character Recognition result and phonetic recognition result identified, it is true in specific name list of entities The final voice identification result of fixed name entity to be identified, and do not depend solely on the knowledge of hanzi form Other result determines final voice identification result in specific name list of entities, improves real to name The accuracy of the identification of body voice.

Brief description of the drawings

By reading the detailed description made to non-limiting example made with reference to the following drawings, this Shen Other features, objects and advantages please will become more apparent upon：

The flow chart of the method for the speech recognition that Fig. 1 provides for the application one embodiment；

Fig. 2 is speech recognition architecture schematic diagram general at present；

Fig. 3 is that the application one embodiment determines candidate's name entity and name entity voice to be identified One particular flow sheet of similarity；

Fig. 4 is the flow chart of the method for the speech recognition of the application another embodiment；

Fig. 5 is the flow chart of the name voice search method of the application one embodiment；

Fig. 6 is the flow chart of the song voice search method of the application one embodiment；

Fig. 7 is the flow chart of the method that communication connection is set up by voice of the application one embodiment；

Fig. 8 is the block diagram of the speech recognition equipment of the application one embodiment；

Fig. 9 is a specific block diagram of the similarity determining unit of the application one embodiment；

Figure 10 is the block diagram of the speech recognition equipment of the application another embodiment；

Figure 11 is the block diagram of the name voice searching device of the application one embodiment；

Figure 12 is the block diagram of the song voice searching device of the application one embodiment；

Figure 13 is the block diagram of the device that communication connection is set up by voice of the application one embodiment.

Same or analogous reference represents same or analogous part in accompanying drawing.

Embodiment

It should be mentioned that some exemplary implementations before exemplary embodiment is discussed in greater detail Example is described as processing or the method described as flow chart.Although operations are described as by flow chart The processing of order, but many of which operation can be implemented concurrently, concomitantly or simultaneously. In addition, the order of operations can be rearranged.The processing when its operations are completed can be by Terminate, it is also possible to the additional step being not included in accompanying drawing.The processing can correspond to Method, function, code, subroutine, subprogram etc..

Alleged within a context " computer equipment ", also referred to as " computer ", referring to can be pre- by operation Determine program or instruction to perform the smart electronicses of the predetermined process process such as numerical computations and/or logical calculated Equipment, it can include processor and memory, the survival prestored in memory by computing device Instruction performs book office to perform predetermined process process, or by hardware such as ASIC, FPGA, DSP Reason process, or realized by said two devices combination.Computer equipment include but is not limited to server, PC, notebook computer, tablet personal computer, smart mobile phone etc..

The computer equipment includes user equipment and the network equipment.Wherein, the user equipment includes But it is not limited to computer, smart mobile phone, PDA etc.；The network equipment includes but is not limited to single network Server, multiple webservers composition server group or based on cloud computing (Cloud Computing) The cloud being made up of a large amount of computers or the webserver, wherein, cloud computing is the one of Distributed Calculation Kind, a super virtual computer being made up of the computer collection of a group loose couplings.Wherein, it is described Computer equipment can isolated operation realize the application, also can access network and by with its in network The application is realized in the interactive operation of his computer equipment.Wherein, the net residing for the computer equipment Network includes but is not limited to internet, wide area network, Metropolitan Area Network (MAN), LAN, VPN etc..

It should be noted that the user equipment, the network equipment and network etc. are only for example, other show Computer equipment that is having or being likely to occur from now on or network are such as applicable to the application, should also be included in Within the application protection domain, and it is incorporated herein by reference.

Method discussed hereafter (some of them are illustrated by flow) can by hardware, software, Firmware, middleware, microcode, hardware description language or its any combination are implemented.When with software, When firmware, middleware or microcode are to implement, program code or code segment to implement necessary task It can be stored in machine or computer-readable medium (such as storage medium).(one or more) Processor can implement necessary task.

Concrete structure and function detail disclosed herein are only representational, and are to be used to describe The purpose of the exemplary embodiment of the application.But the application can be by many alternative forms come specific Realize, and be not interpreted as being limited only by the embodiments set forth herein.

Although it should be appreciated that may have been used term " first ", " second " etc. herein to retouch Unit is stated, but these units should not be limited by these terms.It is only using these terms In order to which a unit and another unit are made a distinction.For example, implement without departing substantially from exemplary In the case of the scope of example, first module can be referred to as second unit, and similarly second unit First module can be referred to as.Term "and/or" used herein above is included listed by one of them or more Any and all combination of the associated item gone out.

Term used herein above is not intended to limit exemplary just for the sake of description specific embodiment Embodiment.Unless the context clearly dictates otherwise, otherwise singulative " one " used herein above, " one " alsos attempt to include plural number.It is to be further understood that term " comprising " used herein above and/ Or "comprising" provides the presence of stated feature, integer, step, operation, unit and/or component, And do not preclude the presence or addition of other one or more features, integer, step, operation, unit, group Part and/or its combination.

It should further be mentioned that in some replaces realization modes, the function/action being previously mentioned can be by Occur according to the order different from being indicated in accompanying drawing.For example, depending on involved function/action, The two width figures shown in succession can essentially substantially simultaneously perform or sometimes can be according to opposite Order is performed.

Before the detailed process of the embodiment of the present application is described in detail, the first speech recognition to prior art makees one Lower simple introduction.Fig. 2 is the schematic diagram of the architecture of a speech recognition of the prior art.Such as Shown in Fig. 2, speech database and text are set up respectively generally according to substantial amounts of speech data and text data Database, by extracting phonetic feature from speech data, trains acoustic model, utilizes text data Train language model.When receiving the voice to be identified of input, by extracting the feature of the voice, Syllable is identified by acoustic model, by the possibility mapping relations of syllable in queries dictionary and text, Tone decoding is carried out using language model, by corresponding searching algorithm, the corresponding text of the voice is exported This.

The application is described in further detail below in conjunction with the accompanying drawings.

The embodiment of the present application is generally applicable to get the life to be identified included in voice to be identified The situation of name entity voice.For example, for the application of the song search in a intelligent sound box product, Generally for search song, the voice command that user may send has " I wants to listen ... ", " please be to me Look for ... ", " I wants to listen ... this song bent ", " please looking for ... this song bent to me " etc..User The form for all orders that may be sent is made command template, as described above " I wants to listen ... ", " please be to I looks for ... ", " I wants to listen ... this song bent ", " please looking for ... this song bent to me " etc..When with Family sends voice command, and such as " I wants to listen《The song of Zhang San》" when, the voice command of user is carried out By the identification of acoustic model and language model in preliminary speech recognition, i.e. Fig. 2, the order with storage Template is matched.In general, preliminary speech recognition is for the universal word in template, such as " I Want to listen " mistake will not be recognized, only for " song of Zhang San ", due to originally in training acoustics Voice and text when model and text model may all lay particular emphasis on universal word, seldom with name, The special vocabulary training such as title of the song, therefore for the voice of " song of Zhang San ", which word recognizes it is It is relatively difficult.By preliminary speech recognition, the universal word in the voice command of user is identified, will These universal words identified are matched with the command template stored, so as to find name to be identified Entity voice.As " I wants to listen《The song of Zhang San》" matched with " I wants to listen ... ", then it is to be identified Name the voice that entity voice is " song of Zhang San ".Process below the embodiment of the present application is used to recognize Name entity corresponding to the voice, i.e., be " song of Zhang San " on earth, still " song of chapter three ", " Three taxi driver brothers " etc..

With reference to Fig. 1, in step s 110, using the speech recognition based on Chinese character, to described to be identified Entity voice is named to carry out speech recognition, to identify the Chinese as the name entity voice to be identified The Chinese character sequence of word recognition result.

The speech recognition based on Chinese character is to utilize the text data training language in text database During model is the speech recognition of the Chinese character sequence of text.That is, in the voice shown in Fig. 2 In recognition architecture, when the language model in Fig. 2 is trained using the text data in text database It is the Chinese character sequence of text.

Using the speech recognition based on Chinese character, name entity voice to be identified is identified, exported Recognition result be a string of Chinese character sequences.For example, for the voice of name entity " Zhang San ", output is known Other result is Chinese character sequence " Zhang San ".

With reference to Fig. 1, in the step s 120, using the speech recognition based on phonetic, to the life to be identified Name entity voice carries out speech recognition, to identify that the phonetic as the name entity voice to be identified is known The pinyin sequence of other result.

The speech recognition based on phonetic is to utilize the text data training language in text database During model is the speech recognition of the pinyin sequence of text.That is, in the voice shown in Fig. 2 In recognition architecture, when the language model in Fig. 2 is trained using the text data in text database It is the pinyin sequence of text.

The Chinese phonetic alphabet is international generally accepted standard Chinese latin transliteration standard, is mainly used in Chinese character Phonetic notation.The Chinese phonetic alphabet uses 26 international Latin alphabets, point initial consonant and simple or compound vowel of a Chinese syllable.The voice of Chinese Unit mainly includes syllable and phoneme.A Chinese character can be just a syllable in Chinese, i.e., initial consonant is added Upper simple or compound vowel of a Chinese syllable or an independent simple or compound vowel of a Chinese syllable can just turn into a syllable.Phoneme is according to the natural quality (thing of voice Reason attribute and physiological attribute) mark off the least speech unit come.

The embodiment of the present application sets up the speech recognition network based on phonetic based on the Scheme for the Chinese Phonetic Alphabet.The base It is made up of in the speech recognition network of phonetic acoustic model with the language model based on phonetic.Wherein, acoustics Model can be identical with the acoustic model in the foregoing speech recognition network based on Chinese character.Based on phonetic Language model can be the language model based on syllable or the language model based on phoneme.Therefore, step S120 Including following several embodiments：

In the first embodiment, the speech recognition based on phonetic recognizes for syllable.The phonetic Sequence is syllable sequence.

In the first embodiment, step 120 is specially that the name entity voice to be identified is entered The identification of row syllable, the syllable to identify the syllable recognition result as the name entity voice to be identified Sequence.

That is, recognizing network using acoustic model and the syllable of language model composition that should be based on syllable Syllable identification is carried out to the name entity voice to be identified, it is real as the name to be identified to identify The syllable sequence of the syllable recognition result of body voice.For example, for the voice of name entity " Zhang San ", Recognize that network carries out output syllable sequence " zhang san " after syllable identification by syllable.

In second of embodiment, the speech recognition based on phonetic is for phoneme recognition.It is described Pinyin sequence includes aligned phoneme sequence.In second of embodiment, step 120 is specially to wait to know to described Not Ming Ming entity voice carry out phoneme recognition, using identify as it is described it is to be identified name entity voice sound The aligned phoneme sequence of plain recognition result.

In other words, the phoneme recognition net constituted using acoustic model with the speech model based on phoneme Network carries out phoneme recognition to the name entity voice to be identified, and the name to be identified is used as to identify The aligned phoneme sequence of the phoneme recognition result of entity voice.For example, for the voice of name entity " Zhang San ", Output aligned phoneme sequence " zh ang s an " after phoneme recognition are carried out by the phoneme recognition network.

Based on second of embodiment, in the third embodiment, step S120 may further include：

Tone recognition is carried out to the simple or compound vowel of a Chinese syllable phoneme in the aligned phoneme sequence that identifies, treated using identifying as described The tone sequence of the Tone recognition result of identification name entity voice.

There are four tones in mandarin, be generally the four tones of standard Chinese pronunciation, respectively high and level tone (the first sound), such as b ā；Sun Flat (rising tone), such as b á；Upper sound (the 3rd sound), such as b ǎ；Falling tone (falling tone), such as b à. , typically can also be plus softly (fifth sound) in speech recognition technology.To in the aligned phoneme sequence that identifies Simple or compound vowel of a Chinese syllable is identified, and the tone that will identify that is added in the aligned phoneme sequence, can obtain a string of tones Sequence, using the tone sequence as it is described it is to be identified name entity voice Tone recognition result.Wherein, The tone of the simple or compound vowel of a Chinese syllable identified can be marked behind the simple or compound vowel of a Chinese syllable, so that, obtained after mark as described The tone sequence of the Tone recognition result of name entity voice to be identified.For example, after will be by phoneme recognition " zh ang s an " obtain tone sequence " zh ang1 s an1 " to obtained aligned phoneme sequence after carrying out Tone recognition.

In the 4th kind of embodiment, the speech recognition based on phonetic includes syllable identification and phoneme is known Not, the pinyin sequence includes syllable sequence and aligned phoneme sequence.

In the 4th kind of embodiment, step S120 is specifically included：

Syllable identification is carried out to the name entity voice to be identified, the life to be identified is used as to identify The syllable sequence of the syllable recognition result of name entity voice；And

Phoneme recognition is carried out to the name entity voice to be identified, the life to be identified is used as to identify The aligned phoneme sequence of the phoneme recognition result of name entity voice.

Syllable identification is carried out to the name entity voice to be identified, and to the name entity to be identified The specific descriptions that voice carries out phoneme recognition also refer to the first foregoing embodiment and second embodiment In description.

Based on the 4th kind of embodiment, in the 5th kind of embodiment, step S120 further comprises：

The detailed description of the step is referred in the third embodiment to the rhythm in the aligned phoneme sequence that identifies Vowel element carries out the description of Tone recognition, is not added with repeating herein.

With reference to Fig. 1, in step s 130, according to the Chinese character sequence identified and the pinyin sequence, Determine that each candidate name entity in specific name list of entities is similar to the name entity to be identified Degree.

Similarity is candidate's name entity degree similar to name entity to be identified, and it can be by a variety of Measure to calculate, wherein in a kind of embodiment, naming entity institute right according to each described candidate The editing distance of the Chinese character sequence answered and the Chinese character sequence identified, and each described candidate name Pinyin sequence corresponding to entity determines each candidate with the editing distance of the pinyin sequence identified Name the similarity of entity and the name entity to be identified.

As shown in figure 3, step S130 specifically includes following steps：

Step S131, determines the Chinese character corresponding to each candidate name entity in specific name list of entities Sequence and the editing distance of the Chinese character sequence identified, using as each described candidate name entity with The Chinese character sequence editing distance of the name entity voice to be identified.

Editing distance algorithm (Edit-distance based algorithm, EDA) is to weigh two character strings The algorithm of matching degree, refers between two character strings, as the minimum editor needed for one changes into another Number of operations.The edit operation of license includes a character being substituted for another character, inserts a word Symbol, deletes a character.Each time in specific name list of entities is calculated using editing distance algorithm The editing distance of Chinese character sequence and the Chinese character sequence identified corresponding to choosing name entity.Chinese character sequence In the calculating of the editing distance of row, character is specially Chinese character.For example, candidate's name entity is " Zhang San Song ", the Chinese character sequence identified is " Zhang Sange ", then " chapter " of " Zhang Sange " will be substituted for " opening ", And add " " can just become " song of Zhang San ", the Chinese character sequence of " Zhang Sange " and " song of Zhang San " Row editing distance is 2.

Step S132, determines the phonetic corresponding to each candidate name entity in specific name list of entities Sequence and the editing distance of the pinyin sequence identified, using as each described candidate name entity with The pinyin sequence editing distance of the name entity voice to be identified.

It is identical with step S131, calculate each in specific name list of entities using editing distance algorithm The editing distance of pinyin sequence and the pinyin sequence identified corresponding to individual candidate's name entity, with The pinyin sequence editing distance of entity and the name entity voice to be identified is named as each candidate.

Corresponding to step S120 the first embodiment, the pinyin sequence editing distance is described specific Name the recognition result of each candidate name entity and the name entity voice to be identified in list of entities Syllable sequence editing distance, then calculated using editing distance algorithm in the specific name list of entities The editing distance of syllable sequence and the syllable sequence identified corresponding to each candidate name entity, to make The syllable sequence editing distance of entity and the name entity to be identified is named for each described candidate.Before i.e. It is syllable herein to state the character in editing distance algorithm.For example, the syllable corresponding to candidate's name entity Sequence is " zhang san de ge ", the syllable sequence identified is zhang shang ge ", wherein " zhang Shang ge " will become that " zhang san de ge " first have to " shang " to become " san ", then Add " de ", i.e., the change of 2 syllables, editing distance is 2.

Corresponding to step S120 second of embodiment, the pinyin sequence editing distance is described specific Name the recognition result of each candidate name entity and the name entity voice to be identified in list of entities Aligned phoneme sequence editing distance, then calculate each in the specific name list of entities using editing distance algorithm The editing distance of aligned phoneme sequence and the aligned phoneme sequence identified corresponding to individual candidate name entity, using as The aligned phoneme sequence editing distance of each candidate name entity and the name entity voice to be identified.I.e. Character in foregoing editing distance algorithm is phoneme herein.For example, the sound corresponding to candidate's name entity Prime sequences for " zh ang s an d e g e ", the syllable sequence identified be " zh ang sh ang g e ", Wherein " zh ang sh ang g e " will become that " zh ang s an d e g e " first have to " sh " to become " ang ", is become " an " by " s ", then adds the change of the syllable of " d " and " e ", i.e., 4, Editing distance is 4.Corresponding to step S120 the third embodiment, the pinyin sequence editing distance The phoneme sequence of entity and the recognition result of the name entity voice to be identified is named including each described candidate Row editing distance harmony sequencing row editing distance, then

It is right that each candidate name entity institute in the specific name list of entities is calculated using editing distance algorithm The editing distance of the aligned phoneme sequence answered and the aligned phoneme sequence identified, to name real as each described candidate The aligned phoneme sequence editing distance of body and the name entity voice to be identified；And

It is right that each candidate name entity institute in the specific name list of entities is calculated using editing distance algorithm The editing distance of the tone sequence answered and the tone sequence identified, to name real as each described candidate The tone sequence editing distance of body and the name entity voice to be identified.

Aligned phoneme sequence in the specific name list of entities corresponding to each candidate name entity is with identifying Aligned phoneme sequence editing distance calculation it is as described above.Each in the specific name list of entities In the calculating of the editing distance of tone sequence and the tone sequence identified corresponding to candidate's name entity, Character in foregoing editing distance algorithm is tone herein.For example, the sound corresponding to candidate's name entity Sequencing is classified as that " zhang1 san1 ", the tone sequence identified is " zhang1 san2 ", wherein " zhang1 San2 " will become that " zhang1 san1 ", it is only necessary to change the tone of " san ", editing distance is 1.

Corresponding to step S120 the 4th kind of embodiment, the pinyin sequence editing distance includes described each The syllable sequence editing distance of individual candidate's name entity and the recognition result of the name entity voice to be identified With aligned phoneme sequence editing distance, then

It is right that each candidate name entity institute in the specific name list of entities is calculated using editing distance algorithm The editing distance of the syllable sequence answered and the syllable sequence identified, to name real as each described candidate The syllable sequence editing distance of body and the name entity voice to be identified；And

It is right that each candidate name entity institute in the specific name list of entities is calculated using editing distance algorithm The editing distance of the aligned phoneme sequence answered and the aligned phoneme sequence identified, to name real as each described candidate The aligned phoneme sequence editing distance of body and the name entity voice to be identified.The specific name list of entities In syllable sequence corresponding to each candidate name entity and the syllable sequence that identifies editing distance, institute State the aligned phoneme sequence in specific name list of entities corresponding to each candidate name entity and the phoneme identified The calculation of the editing distance of sequence is as described above.

Corresponding to step S120 the 5th kind of embodiment, the pinyin sequence editing distance includes described each Individual candidate's name entity and the syllable sequence editing distance to be identified for naming entity, aligned phoneme sequence editor Apart from harmony sequencing row editing distance, then

It is right that each candidate name entity institute in the specific name list of entities is calculated using editing distance algorithm The editing distance of the syllable sequence answered and the syllable sequence identified, to name real as each described candidate The syllable sequence editing distance of body and the name entity voice to be identified；

It is right that each candidate name entity institute in the specific name list of entities is calculated using editing distance algorithm The editing distance of the tone sequence answered and the tone sequence identified, to name real as each described candidate The tone sequence editing distance of body and the name entity voice to be identified.The specific name list of entities In syllable sequence corresponding to each candidate name entity and the syllable sequence that identifies editing distance, institute State the aligned phoneme sequence in specific name list of entities corresponding to each candidate name entity and the phoneme identified Tone in the editing distance of sequence, the specific name list of entities corresponding to each candidate name entity Sequence and the calculation of the editing distance of the tone sequence identified are as described above.

Step S133, the Chinese character sequence of entity and the name entity to be identified is named according to each described candidate Row editing distance and pinyin sequence editing distance, calculate each candidate name entity with it is described to be identified Name the overall editing distance of entity.

Overall editing distance can be weighted average editing distance, be averaged editing distance, editing distance add Quan He, editing distance and wait.

If overall editing distance is weighted average editing distance, the Chinese character sequence can be pre-set and compiled Collect distance and each self-corresponding predefined weight of the pinyin sequence editing distance.Carrying out, name to be identified is real , can be according to the predefined weight, to each in specific name list of entities during the speech recognition of body voice Candidate name entity with it is described it is to be identified name entity Chinese character sequence editing distance and pinyin sequence editor away from From being weighted processing, obtained weighted average is regard as each in the specific name list of entities Candidate names the overall editing distance of entity and the name entity voice to be identified.

It is exactly the equal situation of each predefined weight as a kind of special case of overall editing distance, that is, always Body editing distance is the situation of average editing distance.

Furthermore it is also possible to allow overall editing distance to be equal to candidate's name entity and the name entity to be identified Chinese character sequence editing distance and pinyin sequence editing distance weighting and/or and wait.

In the case where overall editing distance is weighted average, the first corresponding to step S120 or Two kinds of embodiments, the identification knot of entity and the name entity to be identified is named according to each described candidate The corresponding weight of Chinese character sequence editing distance and the corresponding weight of syllable sequence editing distance or phoneme sequence of fruit The corresponding weight of row editing distance, to the Chinese character sequence editing distance and the syllable sequence editing distance Processing is weighted, or the Chinese character sequence editing distance and the aligned phoneme sequence editing distance are carried out Weighting is handled, and obtained weighted average is named as each candidate in the specific name list of entities The overall editing distance of entity and the name entity voice to be identified.

In the case where overall editing distance is weighted average, corresponding to step S120 the third implementation Mode, the Chinese character of entity and the recognition result of the name entity to be identified is named according to each described candidate The corresponding weight of sequence editing distance, aligned phoneme sequence editing distance corresponding weight harmony sequencing row editor away from From corresponding weight, to the Chinese character sequence editing distance, the aligned phoneme sequence editing distance and the sound Sequencing row editing distance is weighted processing, to obtain their weighted average as each described candidate Name the overall editing distance of entity and the name entity voice to be identified.

In the case where overall editing distance is weighted average, corresponding to step S120 the 4th kind of implementation Mode, the Chinese character of entity and the recognition result of the name entity to be identified is named according to each described candidate The corresponding weight of sequence editing distance, the corresponding weight of syllable sequence editing distance and aligned phoneme sequence editor away from From corresponding weight, to the Chinese character sequence editing distance, the syllable sequence editing distance and the sound Prime sequences editing distance is weighted processing, to obtain their weighted average as each described candidate Name the overall editing distance of entity and the name entity voice to be identified.

In the case where overall editing distance is weighted average, corresponding to step S120 the 5th kind of implementation Mode, the Chinese character of entity and the recognition result of the name entity to be identified is named according to each described candidate The corresponding weight of sequence editing distance, the corresponding weight of syllable sequence editing distance, aligned phoneme sequence editor away from Weight corresponding from corresponding weight harmony sequencing row editing distance, to the Chinese character sequence editing distance, The syllable sequence editing distance, the aligned phoneme sequence editing distance and the tone sequence editing distance enter Row weighting is handled, and waits to know with this as each described candidate name entity to obtain their weighted average Not Ming Ming entity voice overall editing distance.

Step S134, will calculate each candidate described in obtaining and names entity and the name entity to be identified The overall editing distance of voice and the inverse of predetermined constant sum, as each described candidate name entity with The similarity of the name entity voice to be identified.

Because editing distance is smaller, similarity is higher, therefore, and each candidate name entity is treated with described The overall editing distance of identification name entity voice is used as the similar of them to the inverse of predetermined constant sum Degree.Due to there may be the situation that overall editing distance is 0, it is therefore desirable to preset a constant, So as to regard the overall editing distance and the predetermined constant sum as the denominator part of similarity.This makes a reservation for Constant is preferably set to 1, then similarity=1/ (d+1), wherein d be candidate name entity with it is to be identified Name the overall editing distance of entity.For example, some candidate name entity and the name entity to be identified Overall editing distance be 1, then their similarity be 1/ (1+1)=1/2.

With reference to Fig. 1, in step S140, entity and the life to be identified are named according to each described candidate The similarity of name entity voice, the name entity to be identified is determined from the specific name list of entities The voice identification result of voice.

Specifically, by the knowledge in the specific name list of entities with the name entity voice to be identified The maximum candidate's name entity of the similarity of other result is known as the voice of the name entity voice to be identified Other result.In fact, namely by it is described it is specific name list of entities in the name entity to be identified The minimum candidate's name entity of the overall editing distance of the recognition result of voice is real as the name to be identified The voice identification result of body voice.

The embodiment of the present application obtains hanzi form in the speech recognition for carrying out routine to name entity to be identified On the basis of recognition result, phonetic identification is also carried out, the recognition result of PINYIN form is obtained, and according to knowledge The Chinese Character Recognition result and phonetic recognition result not gone out, life to be identified is determined in specific name list of entities The final voice identification result of name entity, improves the accuracy to naming entity speech recognition.

In addition, in order to further improve the accuracy to naming entity speech recognition, it is described based on Chinese character The language model used in speech recognition can be ordered with each candidate in the specific name list of entities The corresponding Chinese character sequence of name entity and the Chinese character sequence of the text in general training text storehouse jointly training and Generation.

(as shown in Figure 2), its language used in the architecture of the general speech recognition based on Chinese character Speech model be only trained with the Chinese character sequence of the text in general training text storehouse come.Due to general The general few name entities of text in training text storehouse, such as name, place name, such voice is known Other architecture is for naming the identification of entity, and accuracy is poor.But in embodiments herein, The corresponding Chinese character sequence of entity can be named with each candidate in the specific name list of entities and logical With the common train language model of Chinese character sequence of the text in training text storehouse, name is just further increased The accuracy of entity speech recognition.

In addition, in order to further improve the accuracy to naming entity speech recognition, the syllable identification is used To language model can be with to it is described it is specific name list of entities in each candidate name entity carry out Syllable sequence and obtained to the text progress syllable expansion in general training text storehouse that syllable expansion is obtained Syllable sequence training and generate.The language model that the phoneme recognition is used can be with to the spy Surely each candidate name entity progress phoneme in name list of entities deploys obtained aligned phoneme sequence and right What the aligned phoneme sequence that the text progress phoneme expansion in general training text storehouse is obtained was trained and generated.So, With only carrying out the syllable sequence training language mould that syllable expansion is obtained with the text in general training text storehouse Type only carries out the aligned phoneme sequence training language that phoneme expansion is obtained with the text in general training text storehouse Speech model is compared, and each candidate added during due to training in the specific name list of entities names real Body, just further increases the accuracy of name entity speech recognition.

With reference to Fig. 4, based on above-mentioned any embodiment, alternatively, the audio recognition method 1 also includes obtaining Take the step S100 of the name entity voice to be identified included in voice to be identified.

In actual application scenarios, user is typically in short, not to be when assigning voice command Only send the voice of a name entity.For example, user sends, " I will listen《The song of Zhang San》" Voice.Accordingly, it would be desirable to identify which is partly name entity language to be identified in the voice that user sends Sound.

As it was previously stated, in a kind of embodiment, can be to including name entity voice to be identified Voice to be identified carry out preliminary speech recognition, according to the result of identification and the command template prestored Matched, so that it is determined that it is partly name entity voice to be identified which, which goes out in the voice,.

As shown in figure 5, the application one embodiment provides a kind of name voice search method 2, including： S200, voice command to be identified matched with the voice command template prestored, so as to obtain out Name voice to be identified in the voice command to be identified；S210, using the speech recognition based on Chinese character, Speech recognition is carried out to name voice to be identified, to identify the Chinese character as the name voice to be identified The Chinese character sequence of recognition result；S220, using the speech recognition based on phonetic, to the name to be identified Voice carries out speech recognition, the spelling to identify the phonetic recognition result as the name voice to be identified Sound sequence；The Chinese character sequence and the pinyin sequence that S230, basis are identified, determine specific name The similarity of each candidate's name and the name voice to be identified in list；S240, according to described each The similarity of individual candidate's name and the name voice to be identified, institute is determined from the particular person list of file names State the voice identification result of name voice to be identified.

Compared with Fig. 4, Fig. 5 embodiment is only the scheme that entity will be named to be embodied as name, Therefore its each step is implemented and do not repeated.Here, particular person list of file names can be all employees of company List, by Fig. 5 embodiment, has just reached by simple interactive voice come phonetic search company person The effect of work, can be used for the situations such as the automatic transfer of firm telephone.

As shown in fig. 6, one embodiment of the application provides a kind of song voice search method 3, bag Include：S300, voice command to be identified matched with the voice command template prestored, so as to obtain Take out the title of the song voice to be identified in the voice command to be identified；S310, voice based on Chinese character is utilized to know Not, speech recognition is carried out to title of the song voice to be identified, the title of the song voice to be identified is used as to identify The Chinese character sequence of Chinese Character Recognition result；S320, using the speech recognition based on phonetic, to described to be identified Title of the song voice carries out speech recognition, to identify the phonetic recognition result as the title of the song voice to be identified Pinyin sequence；The Chinese character sequence and the pinyin sequence that S330, basis are identified, are determined specific Each candidate's title of the song and the similarity of the title of the song voice to be identified in title of the song list；S340, according to institute The similarity of each candidate's title of the song and the title of the song voice to be identified is stated, from the specific title of the song list really The voice identification result of the fixed title of the song voice to be identified.

Compared with Fig. 4, Fig. 6 embodiment is only the scheme that entity will be named to be embodied as title of the song, Therefore its each step is implemented and do not repeated.The song that the program can be used in intelligent sound box product is searched Rope.Here, specific title of the song list can be the title of the song list of all songs stored in audio amplifier.Pass through figure 6 embodiment, has just reached by simple interactive voice to search for the song in audio amplifier, so as to realize The effect of the automatic program request of voice.

As shown in fig. 7, one embodiment of the application provide it is a kind of by voice set up communication connection Method 5, including：S200, voice command to be identified and the voice command template that prestores carried out Match somebody with somebody, so as to obtain out the name voice to be identified in the voice command to be identified；S210, using based on the Chinese The speech recognition of word, carries out speech recognition, to identify as described to be identified to name voice to be identified The Chinese character sequence of the Chinese Character Recognition result of name voice；S220, using the speech recognition based on phonetic, it is right The name voice to be identified carries out speech recognition, to identify the spelling as the name voice to be identified The pinyin sequence of sound recognition result；The Chinese character sequence and the pinyin sequence that S230, basis are identified, Determine the similarity of each name and the name voice to be identified in user communication record；S240, basis The similarity of each described candidate's name and the name voice to be identified, from the user communication record really The voice identification result of the fixed name voice to be identified；S250, to determination it is used as voice identification result User communication record in user initiate communication connection.

The step S200-S240 of Fig. 7 embodiment is similar with Fig. 5 embodiment, therefore does not repeat.Step The user that S250 can be included into the user communication record as voice identification result of determination initiates call and connected Connect request or the user into the user communication record as voice identification result of determination sends short message.

The program can be for example used in vehicle-mounted voice automated communications product.Here, user communication record can be with It is the address list stored in user terminal.So, reached when driver drives without dialing mobile phone with hand Simply by the effect that briefly in short just can be conversed or send short messages automatically.

As shown in figure 8, one embodiment of the application provides a kind of device 4 of speech recognition, the dress Putting 4 includes：

First recognition unit 410, for utilizing the speech recognition based on Chinese character, to name entity to be identified Voice carries out speech recognition, to identify the Chinese Character Recognition result as the name entity voice to be identified Chinese character sequence；

Second recognition unit 420, for utilizing the speech recognition based on phonetic, to the name to be identified Entity voice carries out speech recognition, to identify the phonetic identification as the name entity voice to be identified As a result pinyin sequence；

Similarity determining unit 430, the Chinese character sequence identified for basis and the pinyin sequence, Determine each candidate name entity and the name entity voice to be identified in specific name list of entities Similarity；

Recognition result determining unit 440, for according to each candidate name entity with it is described to be identified The similarity of entity voice is named, determines that the name to be identified is real from the specific name list of entities The voice identification result of body voice.

Alternatively, the language model used in the speech recognition based on Chinese character is with the specific name The text in each candidate name corresponding Chinese character sequence of entity and general training text storehouse in list of entities What this Chinese character sequence was trained and generated jointly.

Alternatively, the speech recognition based on phonetic recognizes for syllable, and the pinyin sequence includes syllable Sequence.Second recognition unit is further used for：Syllable identification is carried out to the name entity voice to be identified, Syllable sequence to identify the syllable recognition result as the name entity voice to be identified.

Alternatively, the speech recognition based on phonetic is phoneme recognition, and the pinyin sequence includes phoneme Sequence.Second recognition unit is further used for：Phoneme recognition is carried out to the name entity voice to be identified, Aligned phoneme sequence to identify the phoneme recognition result as the name entity voice to be identified.

Alternatively, the speech recognition based on phonetic includes syllable identification and phoneme recognition, the phonetic Sequence includes syllable sequence and aligned phoneme sequence.Second recognition unit is further used for：To the life to be identified Name entity voice carries out syllable identification, to identify that the syllable as the name entity voice to be identified is known The syllable sequence of other result；And phoneme recognition is carried out to the name entity voice to be identified, to recognize Go out the aligned phoneme sequence of the phoneme recognition result as the name entity voice to be identified.

Alternatively, the second recognition unit is further used for：

Alternatively, as shown in figure 9, similarity determining unit 430 includes：

Chinese character sequence editing distance determination subelement 431, it is each in specific name list of entities for determining The editing distance of Chinese character sequence and the Chinese character sequence identified corresponding to individual candidate's name entity, with As each described candidate name entity with it is described it is to be identified name entity voice Chinese character sequence editor away from From；

Pinyin sequence editing distance determination subelement 432, it is each in specific name list of entities for determining The editing distance of pinyin sequence and the pinyin sequence identified corresponding to individual candidate's name entity, with As each described candidate name entity with it is described it is to be identified name entity voice pinyin sequence editor away from From；

Overall editing distance determination subelement 433, for according to each candidate name entity with it is described The Chinese character sequence editing distance and pinyin sequence editing distance of name entity voice to be identified, calculate described each The overall editing distance of individual candidate's name entity and the name entity voice to be identified；

Similarity determination subelement 434, for will calculate obtain described in each candidate name entity and institute The overall editing distance and the inverse of predetermined constant sum of name entity voice to be identified are stated, as described each The similarity of individual candidate's name entity and the name entity voice to be identified.

Alternatively, the language model that the syllable identification is used is with the specific name list of entities Each candidate name entity carry out the syllable obtained syllable sequence of expansion and in general training text storehouse Text carry out syllable expansion obtain syllable sequence training and generate.

Alternatively, the language model that the phoneme recognition is used is with the specific name list of entities Each candidate name entity carry out the phoneme obtained aligned phoneme sequence of expansion and in general training text storehouse Text carry out phoneme expansion obtain aligned phoneme sequence training and generate.

Alternatively, as shown in Figure 10, the device 4 also includes：

Acquiring unit 400, the name entity voice to be identified included for obtaining in voice to be identified.

With reference to Figure 11, according to one embodiment of the application there is provided a kind of name voice searching device 6, Including：

Name voice acquiring unit 610 to be identified, for by voice command to be identified and the language that prestores Sound command template is matched, so as to obtain out the name voice to be identified in the voice command to be identified；

First name voice recognition unit 620 to be identified, it is right for utilizing the speech recognition based on Chinese character Name voice to be identified carries out speech recognition, and the Chinese character to identify as the name voice to be identified is known The Chinese character sequence of other result；

Second name voice recognition unit 630 to be identified, it is right for utilizing the speech recognition based on phonetic The name voice to be identified carries out speech recognition, to identify the spelling as the name voice to be identified The pinyin sequence of sound recognition result；

Name similarity determining unit 640 to be identified, for according to the Chinese character sequence and institute identified Pinyin sequence is stated, each candidate's name and the name voice to be identified in particular person list of file names is determined Similarity；

Name voice identification result determining unit 650 to be identified, for according to each described candidate's name with The similarity of the name voice to be identified, determines the name to be identified from the particular person list of file names The voice identification result of voice.

With reference to Figure 12, according to one embodiment of the application there is provided a kind of song voice searching device 7, Including：

Title of the song voice acquiring unit 710 to be identified, for by voice command to be identified and the language that prestores Sound command template is matched, so as to obtain out the title of the song voice to be identified in the voice command to be identified；

First title of the song voice recognition unit 720 to be identified, it is right for utilizing the speech recognition based on Chinese character Title of the song voice to be identified carries out speech recognition, and the Chinese character to identify as the title of the song voice to be identified is known The Chinese character sequence of other result；

Second title of the song voice recognition unit 730 to be identified, it is right for utilizing the speech recognition based on phonetic The title of the song voice to be identified carries out speech recognition, to identify the spelling as the title of the song voice to be identified The pinyin sequence of sound recognition result；

Title of the song similarity determining unit 740 to be identified, for according to the Chinese character sequence and institute identified Pinyin sequence is stated, each candidate's title of the song and the title of the song voice to be identified in specific title of the song list is determined Similarity；

Title of the song voice identification result determining unit 750 to be identified, for according to each described candidate's title of the song with The similarity of the title of the song voice to be identified, the title of the song to be identified is determined from the specific title of the song list The voice identification result of voice.

With reference to Figure 13, communication link is set up by voice there is provided one kind according to one embodiment of the application The device 8 connect, including：

Name similarity determining unit 640 to be identified, for according to the Chinese character sequence and institute identified Pinyin sequence is stated, the similarity of each name and the name voice to be identified in user communication record is determined；

Name voice identification result determining unit 650 to be identified, for according to each described candidate's name with The similarity of the name voice to be identified, determines the name language to be identified from the user communication record The voice identification result of sound；

Unit 660 is initiated in communication connection, for the user communication record as voice identification result to determination In user initiate communication connection.

Alternatively, communication connection initiation unit is further used for the use as voice identification result to determination User in the address list of family initiates call connection request or led to the user as voice identification result of determination User in news record sends short message.

It should be noted that the application can be carried out in the assembly of software and/or software and hardware, For example, each device of the application can be using application specific integrated circuit (ASIC) or any other is similar hard Part equipment is realized.In one embodiment, the software program of the application can pass through computing device To realize steps described above or function.Similarly, the software program of the application (includes the number of correlation According to structure) it can be stored in computer readable recording medium storing program for performing, for example, RAM memory, magnetic Or CD-ROM driver or floppy disc and similar devices.In addition, some steps or function of the application can be used Hardware realizes, for example, as coordinating with processor so as to performing the circuit of each step or function.

It is obvious to a person skilled in the art that the application is not limited to the thin of above-mentioned one exemplary embodiment Section, and in the case of without departing substantially from spirit herein or essential characteristic, can be with other specific Form realizes the application.Therefore, no matter from the point of view of which point, embodiment all should be regarded as exemplary , and be nonrestrictive, scope of the present application is limited by appended claims rather than described above It is fixed, it is intended that all changes fallen in the implication and scope of the equivalency of claim are included In the application.The right that any reference in claim should not be considered as involved by limitation will Ask.Furthermore, it is to be understood that the word of " comprising " one is not excluded for other units or step, odd number is not excluded for plural number.System The multiple units or device stated in system claim can also pass through software by a unit or device Or hardware is realized.The first, the second grade word is used for representing title, and is not offered as any specific Order.

Although above specifically shown and describe exemplary embodiment, those skilled in the art will It will be appreciated that, in the case of the spirit and scope without departing substantially from claims, in its form and carefully It can be varied from terms of section.

Claims

1. a kind of method of speech recognition, this method comprises the following steps：

2. according to the method described in claim 1, it is characterised in that the speech recognition based on Chinese character In the language model used be corresponding with each candidate name entity in the specific name list of entities What Chinese character sequence and the Chinese character sequence of the text in general training text storehouse were trained and generated jointly.

3. according to the method described in claim 1, it is characterised in that the speech recognition based on phonetic Recognized for syllable, the pinyin sequence includes syllable sequence,

Speech recognition is carried out to the name entity voice to be identified using the speech recognition based on phonetic, with The step of identifying the pinyin sequence as the phonetic recognition result of the name entity voice to be identified is wrapped Include：

Syllable identification is carried out to the name entity voice to be identified, the life to be identified is used as to identify The syllable sequence of the syllable recognition result of name entity voice.

4. according to the method described in claim 1, it is characterised in that the speech recognition based on phonetic For phoneme recognition, the pinyin sequence includes aligned phoneme sequence,

5. according to the method described in claim 1, it is characterised in that the speech recognition based on phonetic Including syllable identification and phoneme recognition, the pinyin sequence includes syllable sequence and aligned phoneme sequence,

Speech recognition is carried out to the name entity to be identified using the speech recognition based on phonetic, to recognize The step of going out the pinyin sequence as the phonetic recognition result of the name entity voice to be identified also includes：

6. the method according to claim 4 or 5, it is characterised in that utilize the voice based on phonetic Identification carries out speech recognition to the name entity voice to be identified, and the life to be identified is used as to identify The step of pinyin sequence of the phonetic recognition result of name entity voice, also includes：

7. according to the method described in claim 1, it is characterised in that according to the Chinese character sequence identified Row and the pinyin sequence, determine that each candidate name entity in specific name list of entities is treated with described The step of similarity of identification name entity voice, includes：

Determine the Chinese character sequence corresponding to each candidate name entity in specific name list of entities and identification The editing distance of the Chinese character sequence gone out, using as each described candidate name entity with it is described to be identified Name the Chinese character sequence editing distance of entity voice；

Determine the pinyin sequence corresponding to each candidate name entity in specific name list of entities and identification The editing distance of the pinyin sequence gone out, using as each described candidate name entity with it is described to be identified Name the pinyin sequence editing distance of entity voice；

The Chinese character sequence editor of entity and the name entity voice to be identified is named according to each described candidate Distance and pinyin sequence editing distance, calculate each candidate name entity and the name to be identified is real The overall editing distance of body voice；

By calculate obtain described in each candidate name entity with it is described it is to be identified name entity voice totality The inverse of editing distance and predetermined constant sum, as each described candidate name entity with it is described to be identified Name the similarity of entity voice.

8. the method according to claim 3 or 5, it is characterised in that what the syllable identification was used Language model is to carry out syllable expansion with to each candidate name entity in the specific name list of entities Obtained syllable sequence and the syllable sequence obtained to the text progress syllable expansion in general training text storehouse What row were trained and generated.

9. the method according to claim 4 or 5, it is characterised in that what the phoneme recognition was used Language model is to carry out phoneme expansion with to each candidate name entity in the specific name list of entities Obtained aligned phoneme sequence and the phoneme sequence obtained to the text progress phoneme expansion in general training text storehouse What row were trained and generated.

10. according to the method described in claim 1, it is characterised in that also include：

Obtain the name entity voice to be identified included in voice to be identified.

11. a kind of name voice search method, it is characterised in that including：

12. a kind of song voice search method, it is characterised in that including：

13. a kind of method that communication connection is set up by voice, it is characterised in that including：

14. method according to claim 13, it is characterised in that the initiation communication connection includes User into the user communication record as voice identification result of determination initiates call connection request or to true User in the fixed user communication record as voice identification result sends short message.

15. a kind of device of speech recognition, the device includes：

First recognition unit, for utilizing the speech recognition based on Chinese character, to name entity voice to be identified Speech recognition is carried out, the Chinese to identify the Chinese Character Recognition result as the name entity voice to be identified Word sequence；

Second recognition unit, for utilizing the speech recognition based on phonetic, to the name entity to be identified Voice carries out speech recognition, to identify the phonetic recognition result as the name entity voice to be identified Pinyin sequence；

Similarity determining unit, the Chinese character sequence identified for basis and the pinyin sequence, really The phase of each candidate name entity and the name entity voice to be identified in fixed specific name list of entities Like degree；

Recognition result determining unit, for according to each candidate name entity and the name to be identified The similarity of entity voice, determines the name entity language to be identified from the specific name list of entities The voice identification result of sound.

16. device according to claim 15, it is characterised in that the voice based on Chinese character is known The language model used in not is with each candidate name entity correspondence in the specific name list of entities Chinese character sequence and the text in general training text storehouse Chinese character sequence jointly train and generate.

17. device according to claim 15, it is characterised in that the voice based on phonetic is known Not Wei syllable identification, the pinyin sequence include syllable sequence,

Second recognition unit is further used for：

18. device according to claim 15, it is characterised in that the voice based on phonetic is known Not Wei phoneme recognition, the pinyin sequence include aligned phoneme sequence,

Second recognition unit is further used for：

19. device according to claim 15, it is characterised in that the voice based on phonetic is known Not Bao Kuo syllable identification and phoneme recognition, the pinyin sequence include syllable sequence and aligned phoneme sequence,

Second recognition unit is further used for：

20. the device according to claim 18 or 19, it is characterised in that the second recognition unit enters One step is used for：

21. device according to claim 15, it is characterised in that similarity determining unit includes：

Chinese character sequence editing distance determination subelement, for determining each time in specific name list of entities Chinese character sequence corresponding to choosing name entity and the editing distance of the Chinese character sequence identified, using as The Chinese character sequence editing distance of each candidate name entity and the name entity voice to be identified；

Pinyin sequence editing distance determination subelement, for determining each time in specific name list of entities Pinyin sequence corresponding to choosing name entity and the editing distance of the pinyin sequence identified, using as The pinyin sequence editing distance of each candidate name entity and the name entity voice to be identified；

Overall editing distance determination subelement, for waiting to know with described according to each candidate name entity Not Ming Ming entity voice Chinese character sequence editing distance and pinyin sequence editing distance, calculate each described time The overall editing distance of choosing name entity and the name entity voice to be identified；

Similarity determination subelement, for will calculate obtain described in each candidate name entity treated with described The overall editing distance and the inverse of predetermined constant sum of identification name entity voice, are used as each described time The similarity of choosing name entity and the name entity voice to be identified.

22. the device according to claim 17 or 19, it is characterised in that the syllable identification is used To language model be with to it is described it is specific name list of entities in each candidate name entity carry out syllable Deploy obtained syllable sequence and the sound obtained to the text progress syllable expansion in general training text storehouse Save sequence training and generate.

23. the device according to claim 18 or 19, it is characterised in that the phoneme recognition is used To language model be with to it is described it is specific name list of entities in each candidate name entity carry out phoneme Deploy obtained aligned phoneme sequence and the sound obtained to the text progress phoneme expansion in general training text storehouse What prime sequences were trained and generated.

24. device according to claim 15, it is characterised in that also include：

Acquiring unit, the name entity voice to be identified included for obtaining in voice to be identified.

25. a kind of name voice searching device, it is characterised in that including：

Name voice acquiring unit to be identified, for voice command to be identified and the voice prestored to be ordered Template is made to be matched, so as to obtain out the name voice to be identified in the voice command to be identified；

First name voice recognition unit to be identified, for utilizing the speech recognition based on Chinese character, treats knowledge Others carries out speech recognition by name voice, to identify the Chinese Character Recognition knot as the name voice to be identified The Chinese character sequence of fruit；

Second name voice recognition unit to be identified, for utilizing the speech recognition based on phonetic, to described Name voice to be identified carries out speech recognition, and the phonetic to identify as the name voice to be identified is known The pinyin sequence of other result；

Name similarity determining unit to be identified, for according to the Chinese character sequence and the spelling identified Sound sequence, determines that each candidate's name in particular person list of file names is similar to the name voice to be identified Degree；

Name voice identification result determining unit to be identified, for according to each described candidate's name with it is described The similarity of name voice to be identified, determines the name voice to be identified from the particular person list of file names Voice identification result.

26. a kind of song voice searching device, it is characterised in that including：

Title of the song voice acquiring unit to be identified, for voice command to be identified and the voice prestored to be ordered Template is made to be matched, so as to obtain out the title of the song voice to be identified in the voice command to be identified；

First title of the song voice recognition unit to be identified, for utilizing the speech recognition based on Chinese character, treats knowledge Other title of the song voice carries out speech recognition, to identify the Chinese Character Recognition knot as the title of the song voice to be identified The Chinese character sequence of fruit；

Second title of the song voice recognition unit to be identified, for utilizing the speech recognition based on phonetic, to described Title of the song voice to be identified carries out speech recognition, and the phonetic to identify as the title of the song voice to be identified is known The pinyin sequence of other result；

Title of the song similarity determining unit to be identified, for according to the Chinese character sequence and the spelling identified Sound sequence, determines that each candidate's title of the song in specific title of the song list is similar to the title of the song voice to be identified Degree；

Title of the song voice identification result determining unit to be identified, for according to each described candidate's title of the song with it is described The similarity of title of the song voice to be identified, determines the title of the song voice to be identified from the specific title of the song list Voice identification result.

27. a kind of device that communication connection is set up by voice, it is characterised in that including：

Name similarity determining unit to be identified, for according to the Chinese character sequence and the spelling identified Sound sequence, determines the similarity of each name and the name voice to be identified in user communication record；

Name voice identification result determining unit to be identified, for according to each described candidate's name with it is described The similarity of name voice to be identified, determines the name voice to be identified from the user communication record Voice identification result；

Unit is initiated in communication connection, in the user communication record as voice identification result to determination User initiates communication connection.

28. device according to claim 27, it is characterised in that communication connection initiates unit and enters one Walk and initiate call connection request for the user in the user communication record as voice identification result to determination Or the user into the user communication record as voice identification result of determination sends short message.