CN104167206B - Acoustic model merging method and equipment and audio recognition method and system - Google Patents

Acoustic model merging method and equipment and audio recognition method and system Download PDF

Info

Publication number
CN104167206B
CN104167206B CN201310182399.5A CN201310182399A CN104167206B CN 104167206 B CN104167206 B CN 104167206B CN 201310182399 A CN201310182399 A CN 201310182399A CN 104167206 B CN104167206 B CN 104167206B
Authority
CN
China
Prior art keywords
model
acoustic model
acoustic
inscape
distance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310182399.5A
Other languages
Chinese (zh)
Other versions
CN104167206A (en
Inventor
刘贺飞
郭莉莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Canon Inc
Original Assignee
Canon Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Canon Inc filed Critical Canon Inc
Priority to CN201310182399.5A priority Critical patent/CN104167206B/en
Publication of CN104167206A publication Critical patent/CN104167206A/en
Application granted granted Critical
Publication of CN104167206B publication Critical patent/CN104167206B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The present invention relates to acoustic model merging method and equipment and audio recognition method and system.A kind of acoustic model merging method, multiple acoustic models of the first and second acoustic models are included for merging, including:Distributed intelligence obtaining step, obtains the distributed intelligence of the modeling unit of at least first and/or second acoustic model, wherein, the distributed intelligence can reflect significance level of the modeling unit in the language to be recognized;Apart from calculation procedure, calculate respectively the class model inscape that is made up of the of a sort model inscape of the first acoustic model and the second acoustic model to distance;Weighting step, using the distributed intelligence come for each corresponding class model inscape to distance be weighted;Sequence step, each class model inscape of being sorted according to the distance after weighting it is right;And combining step, according to the result of sequence, by the first acoustic model and the second acoustics model combination, the acoustic model for being merged.

Description

Acoustic model merging method and equipment and audio recognition method and system
Technical field
This invention relates generally to be used for automatic speech recognition(ASR)Acoustic model merging method and for automatic language The merging equipment and audio recognition method and speech recognition system of the acoustic model of sound identification, more particularly to for merging The method and apparatus of multiple acoustic models and the audio recognition method and system using the acoustic model after merging.
Background technology
Acoustic model is one of most important part in speech recognition system.In speech recognition system, in order to ensure to know Other accuracy, it usually needs use multiple acoustic models(Acoustic model, AM), for example, for different modeling lists Unit(Such as phoneme, word, word, initial consonant, simple or compound vowel of a Chinese syllable etc.)Different AM, for the different AM of different language, for varying environment Different AM(For example, AM obtained under the AM obtained under quiet environment, noisy environment etc.)Deng.
How to reduce the size of acoustic model, be an important problem in speech recognition technology.
In order to reduce the size of acoustic model, the way for generally using is:By different criterions(For example, data-driven Criterion or rule-based criterion)Merge the mesh of these acoustic models to reach to share the parameter of different acoustic models 's.
Wherein, include for constituting the parameter of the modeling unit of acoustic model:Average, variance, Gaussian Mixture (mixture), state, hidden Markov model(HMM)Deng;And in the parameter of the modeling unit for constituting acoustic model In, Gaussian Mixture includes variance, and state includes one or more Gaussian Mixtures, and hidden markov model includes one or many Individual state.Also, phoneme(Can be including such as single-tone element, diphones, triphones etc.)Model can use hidden markov Model is represented(For example, each phoneme can be represented by three condition hidden Markov model), thus, the reality of the word in language Pronunciation can be represented as hidden Markov model sequence.These parameters belong to the different classes of parameter of acoustic model.This A little different classes of parameters can be as sharable parameter, for example, shared variance, and shared Gaussian Mixture is shared State, shared hidden Markov model or phoneme etc..
The method that several known sizes for reducing acoustic model will specifically be enumerated below.
A kind of method is the acoustic model merging method based on distance.In the method, in different acoustic models its away from To be merged from less two Gaussian Mixtures or state, thus to reduce the size of acoustic model.
A kind of acoustic model merging method based on decision tree is also proposed, this is a kind of method based on data-driven, It is known and conventional, and is referred to alternatively as binding(tied)Gaussian Mixture or bundle status method.Decision tree represents not With model state or phoneme context(context)In HMM parameters between equivalence relation.The performance of decision tree is highly relied on In the number and data distribution of data.Specifically, for example, it is desired to the data volume of acoustic model to be combined is suitable, i.e. need Keep the balance between the data volume of these acoustic models.For example, for multiple similar phoneme contexts, may be because of Training data is not enough and cause in these phoneme contexts each can not train exactly.Therefore, in order to model, can So that they are tied up in groups or by them.By above-mentioned in groups or after bundling operation, independent parameter in model Number is reduced.
Although above-mentioned these methods can reduce the size of acoustic model, there is a problem that for example as follows.
On the above-mentioned acoustic model merging method based on distance, the method does not account for the different phonemes in different language Importance.For example, in each language, the probability that certain phoneme occurs is very high, and this phoneme is important in this language Property it is just higher, or certain phoneme pronunciation change it is relatively more, the importance of this phoneme is just higher.Identical phoneme is in difference Significance level in language is also different.For example, " l " in Japanese equivalent to Chinese in " l " and " r ", therefore, comparatively, " l " in Japanese than Chinese in " l " it is more important.Again for example, " i " in Chinese has three allophones, therefore, in Chinese " i " may than other language in " i " it is more important.Again for example, for be difficult to differentiate between " z ", " c ", " s " and " zh ", " ch ", For the dialect of the Chinese certain areas of " sh ", " h " and its location of may than other language in it is " h " and its residing Position is more important.
On the above-mentioned acoustic model merging method based on decision tree, the Gaussian Mixture that is chosen in groups or ties up or Status component is from two different voice class.The performance of decision tree is highly dependant on number and the training of training data The distribution of data.And, the method does not account for the importance of the different phonemes in different language yet.
Sum it up, these conventional methods above-mentioned mainly have following two problems:
1)First problem is the distribution of the number and training data that are difficult to control to training data;
2)Second Problem is, due to not accounting for the importance of phoneme, therefore, when important state is by other states Replace(Here, the shared state of the merga pass of acoustic model is realized)When, the performance of acoustic model can decline.
Specifically, on above-mentioned Second Problem, in the U.S. of entitled " speech recognition based on multilingual acoustic model " Embodied in method described in state patent disclosure No.US2010/0131262A1.Which disclose one kind Acoustic model merging method based on distance, the method mainly includes:First, based on Rule set, with the probability of main acoustic model Each in one in the distribution function probability-distribution function of at least one second acoustic models of replacement, or, use main sound Learn the shape of the Probability State series model of state at least one second acoustic models of replacement of the Probability State series model of model Each in state, to obtain the second acoustic model of at least one modification, wherein, the Rule set can be range measurement; Then, by main acoustic model and at least one the second acoustics models coupling changed, to obtain multilingual acoustic model.
But, in the method disclosed in the U.S. Patent application, do not account for the different conditions in the second acoustic model Importance.Obviously, as described above, hardly important in the important state in the second acoustic model is by main acoustic model State replace when, the performance of acoustic model will decline.
The content of the invention
In summary, it is desirable to be able to effectively merge acoustic model so as to reduce the size and not of acoustic model in appropriate amount The method and apparatus for significantly reducing the performance of acoustic model, the acoustic model of the merging being enable to obtained by is come accurate Really, speech recognition is effectively carried out.
Present invention seek to address that problem as described above.It is an object of the present invention to provide in one kind solution problem above Any one acoustic model merging method and equipment and audio recognition method and system.
Specifically, the present invention provides such acoustic model merging method and equipment and audio recognition method and system, It can effectively merge acoustic model so as to reduce the size of acoustic model in appropriate amount and not significantly reduce acoustic model Performance, is enable to accurately and efficiently carry out speech recognition using the acoustic model of resulting merging.Specifically, The present invention is only selected by the consideration based on the importance for modeling unit in the language to be recognized and replaces acoustic mode Hardly important model parameter in type realizes the above method and equipment.
According to the one side of present disclosure, there is provided a kind of acoustic model merging method, the first sound is included for merging Multiple acoustic models of model and the second acoustic model are learned, including:Distributed intelligence obtaining step, obtains at least first and/or the The distributed intelligence of the modeling unit of two acoustic models, wherein, the distributed intelligence can reflect that the modeling unit will recognized Language in significance level;Apart from calculation procedure, calculate respectively by the same class of the first acoustic model and the second acoustic model The class model inscape that is constituted of model inscape to distance;Weighting step, using the distributed intelligence come For each corresponding class model inscape to distance be weighted;Sequence step, arranges according to the distance after weighting The each class model inscape of sequence it is right;And combining step, according to the result of sequence, by the first acoustic model and the second acoustics Model combination, the acoustic model for being merged.
According to the other side of present disclosure, there is provided a kind of acoustic model merges equipment, first is included for merging Multiple acoustic models of acoustic model and the second acoustic model, including:Distributed intelligence acquiring unit, is configured as obtaining at least the One and/or second acoustic model modeling unit distributed intelligence, wherein, the distributed intelligence can reflect the modeling unit Significance level in the language to be recognized;Metrics calculation unit, is configured to calculate by the first acoustic model and second The class model inscape that the of a sort model inscape of acoustic model is constituted to distance;Weighted units, quilt Be configured to using the distributed intelligence come for each corresponding class model inscape to distance be weighted;Sequence is single Unit, distance after being configured as according to weighting is right come each class model inscape of sorting;And combining unit, it is configured as According to the result of sequence, by the first acoustic model and the second acoustics model combination, the acoustic model for being merged.
According to the another aspect of present disclosure, there is provided a kind of audio recognition method, including use above-mentioned acoustic mode Acoustic model obtained by type merging method carries out speech recognition.
According to another aspect of present disclosure, there is provided a kind of voice for merging equipment including above-mentioned acoustic model is known Other system.
Wherein, the model inscape be average, variance, Gaussian Mixture, state, hidden Markov model in extremely Few one kind.
Wherein, the distributed intelligence of the modeling unit is the modeling unit in the training storehouse of corresponding acoustic model The frequency or duration of appearance.
The following description for exemplary embodiment is read with reference to the drawings, other property features of the invention and advantage will It is apparent from.
Brief description of the drawings
The accompanying drawing for being incorporated into specification and constituting a part for specification shows embodiments of the invention, and with Description is used to explain principle of the invention together.In the drawings, similar reference is used to represent similar key element.
Fig. 1 is the block diagram of the exemplary hardware configuration of the computer system for illustrating the ability to implementation embodiments of the invention.
Fig. 2 is exemplarily illustrated the flow chart of the acoustic model merging method of first embodiment of the invention.
Fig. 3 is exemplarily illustrated the flow chart of acoustic model merging method according to the second embodiment of the present invention.
Fig. 4 is exemplarily illustrated the flow chart of acoustic model merging method according to the third embodiment of the invention.
Fig. 5 is the schematic frame for showing the exemplary configuration of acoustic model merging equipment according to an embodiment of the invention Figure.
Fig. 6 is the exemplary configuration for showing the combining unit in acoustic model merging equipment according to an embodiment of the invention Schematic block diagram.
Fig. 7 is the another exemplary for showing the combining unit in acoustic model merging equipment according to an embodiment of the invention The schematic block diagram of configuration.
Fig. 8 shows the schematic block diagram of the configuration of the speech recognition system of exemplary embodiment of the invention.
Specific embodiment
It should be noted that following exemplary embodiment is not intended to limit scope of the following claims, and exemplary It for solution of the invention must be required that all combinations of the feature described in embodiment are not.It is described below Each in exemplary embodiment of the invention can be implemented separately, or in the case of necessary or in single embodiment Middle key element of the combination from each embodiment or be characterized in it is beneficial in the case of as multiple embodiments or their feature Combine to implement.
Because reference similar in the drawings is used to represent similar key element, so, in the description for These similar key elements will not repeated description, and those of ordinary skill in the art are understood that these similar key elements represent class As implication.
In the disclosure, term " first ", " second " etc. are only used only for being made a distinction between key element, and are not intended Represent time sequencing, priority or importance etc..
And, in the disclosure, the execution sequence of step is not necessarily meant to according to institute shown by flow chart and in embodiment As mentioning, and can be come flexible according to actual conditions, i.e. the present invention should not be subject to shown by flow chart The limitation of the execution sequence of step.
Also, for example can be in the disclosure, word, word, sound/simple or compound vowel of a Chinese syllable, sound for constituting the modeling unit of acoustic model Element etc., and it is not limited to these.For different language, modeling unit may be different.
Additionally, " importance of modeling unit " or " significance level of modeling unit " that occurs in the disclosure at least includes Situations below:By taking phoneme as an example, it can refer to that frequency of occurrences phoneme high is important in daily life, or its position pair The phoneme played a decisive role in pronunciation is important, or the diverse phoneme that pronounces is important.But it is not limited to institute above These situations stated.Also, the important modeling unit in every kind of language(Such as phoneme)May be different.
Thus, a property feature of the invention is, it is possible to use the distributed intelligence of modeling unit reflects the modeling list Importance or significance level of the unit in the language to be recognized.It is such can be used for reflect " importance of modeling unit " or The distributed intelligence of " significance level of modeling unit " can be obtained based on experience, it is also possible to be counted by from training storehouse And obtain.
In addition, in the present invention, by the parameter for constituting modeling unit(For example average, variance, Gaussian Mixture, state, Hidden Markov model etc.)It is referred to as model inscape.In the case where being not specifically stated, mention in this manual Being represented during model inscape can refer at least one of all of parameter or these parameters for constituting modeling unit.
Hereinafter, with reference to the accompanying drawings to being described in detail to exemplary embodiment of the invention.
Fig. 1 is the block diagram of the hardware configuration of the computer system 1 for illustrating the ability to implementation embodiments of the invention.
As shown in fig. 1, computer system 1 includes computer 1110.Computer 1110 is included via system bus 1121 The processing unit 1120 of connection, system storage 1130, fixed non-volatile memory interface 1140, removable non-volatile deposit Memory interface 1150, user input interface 1160, network interface 1170, video interface 1190 and peripheral interface 1195.
System storage 1130 includes ROM(Read-only storage)1131 and RAM(Random access memory)1132.BIOS (Basic input output system)During 1133 reside in ROM1131.Operating system 1134, application program 1135, other program modules 1136 and during some routine datas 1137 reside in RAM1132.
The fixed non-volatile memory 1141 of such as hard disk etc is connected to fixed non-volatile memory interface 1140. Fixed non-volatile memory 1141 for example can be with storage program area 1144, application program 1145, other program modules 1146 With some routine datas 1147.
The removable non-volatile memory of such as floppy disk 1151 and CD-ROM drive 1155 etc is connected to Removable non-volatile memory interface 1150.For example, diskette 1 152 can be inserted into floppy disk 1151, and CD (CD)1156 can be inserted into CD-ROM drive 1155.
The input equipment of such as microphone 1161 and keyboard 1162 etc is connected to user input interface 1160.
Computer 1110 can be connected to remote computer 1180 by network interface 1170.For example, network interface 1170 Remote computer 1180 can be connected to via LAN 1171.Or, network interface 1170 may be coupled to modem (Modulator-demodulator)1172, and modem 1172 is connected to remote computer 1180 via wide area network 1173.
Remote computer 1180 can include the memory 1181 of such as hard disk etc, its storage remote application 1185。
Video interface 1190 is connected to monitor 1191.
Peripheral interface 1195 is connected to printer 1196 and loudspeaker 1197.
Computer system shown in Fig. 1 can be incorporated in any embodiment, as stand-alone computer, or be able to can also make It is the processing system in equipment, one or more unnecessary components can be removed, it is also possible to be added to one or more Individual additional component.
User can in any manner using the computer system shown in Fig. 1, and the present invention uses computer for user The mode of system is not restricted.
Obviously, the computer system shown in Fig. 1 is exemplary, and is in no way intended to limit of the invention, of the invention Using or purposes.
[first embodiment]
Hereinafter, reference picture 2 is described in detail the first embodiment of the present invention.
Fig. 2 is exemplarily illustrated the flow chart of acoustic model merging method according to an embodiment of the invention.
In the present embodiment, operation is merged for the first acoustic model and the second acoustic model.Wherein, the first acoustics Model and the second acoustic model are such as to be based on maximum for different language uses based on speech data by storehouse is trained The training method of likelihood rule(Maximum Likelihood,ML)Or discrimination training method(Discriminative Training,DT)Etc method be trained obtained from.Here, training storehouse in speech data generally by one or The people that multiple speaks one's mother tongue provides.For example, can be as the first acoustic model of primary acoustic model(Alternatively referred to as general acoustic mode Type UAM)Can be configured as recognizing multilingual(For example, English, Chinese etc.)Phonetic entry, can as auxiliary acoustic mode Second acoustic model of type can be configured as recognizing for example a kind of rare foreign languages language or one group of rare foreign languages language(For example, lotus Blue language, Norwegian, Swedish etc.)Phonetic entry.It is of course also possible to be the second acoustic model as such primary acoustic mould Type, the first acoustic model is used as such auxiliary acoustic model.
Below, following such implementation method will in the first embodiment be illustrated:Using as primary acoustic model The first acoustic model in modeling unit model inscape come replace the corresponding model in the second acoustic model constitute Key element, is then combined with the first acoustic model and the second acoustic model, so that the first acoustic model is not modified, at least to keep Its as primary acoustic model performance.
Specifically, in the present embodiment, by using the distributed intelligence of modeling unit come to model inscape to away from From being weighted, then, acoustic model is merged according to the result of weighting.
Wherein, the distributed intelligence of modeling unit can reflect significance level of the modeling unit in the language to be recognized. For example, the distributed intelligence of modeling unit can be used to represent that the modeling unit goes out in the training storehouse of corresponding acoustic model Existing frequency or duration.
As it was previously stated, the distributed intelligence of modeling unit can be obtained by being counted from training storehouse;And for structure Model inscape into modeling unit is including average, variance, Gaussian Mixture, state, hidden Markov model etc.;Therefore, It is, for example possible to use state occupation probability(Or state occupies counting)As the distributed intelligence of modeling unit, wherein, state is accounted for There is probability(Or state occupies counting)How many modeling unit in tranining database can be used to indicate that(Such as phoneme) Certain state is used.
Therefore, by taking state occupation probability as an example the acoustic mode of first embodiment of the invention will be described in detail below Each exemplary step of type merging method.
First, in step S201, the distributed intelligence of modeling unit is obtained.Specifically, as set forth above, it is possible to be based on experience or Person obtains the distributed intelligence of modeling unit by being counted in storehouse is trained.For example, can be in the same of training acoustic model Shi Jinhang countings/count to obtain such distributed intelligence.More specifically, for example obtaining state occupation probability as modeling unit Distributed intelligence.
Need exist for, it is emphasized that the distributed intelligence of modeling unit not only can be counted and obtained from training storehouse, can be with Obtained from the data of modeling unit itself.It is, for example possible to use phoneme alignment accuracy represents the distributed intelligence of modeling unit. That is, phoneme alignment is performed in tranining database, is then used as modeling unit using the accuracy of identification of each phoneme(Phoneme) Distributed intelligence for reflect phoneme importance or significance level, i.e. accuracy of identification phoneme higher can be caused more Important, the phoneme such that it is able to avoid accuracy of identification high as far as possible is merged.
In a word, it is of the invention for being obtained in that the mode of the distributed intelligence of modeling unit is not intended to be limited in any, as long as its energy Enough reflect significance level of the modeling unit in the language to be recognized.
Then, in step S202, computation model inscape to distance, i.e. calculate respectively by the first acoustic model The class model inscape constituted with the of a sort model inscape of the second acoustic model to distance.Specifically Ground, can calculate the first acoustic model and the second acoustic model certain class model inscape to distance.More specifically, example Such as, the distance of state pair can be calculated, i.e. as the state of the first acoustic model of main acoustic model and as auxiliary acoustic mode The state of the second acoustic model of type to distance.Here, it is preferred that ground is by each state and the rising tone of the first acoustic model Each state for learning model constitutes state pair, then calculates the distance of each state pair.
Can be used to calculate above-mentioned model inscape to distance distance calculating method include it is for example European away from From(Can refer to http://en.wikipedia.org/wiki/Euclidean_distance), K-L distances be K-L divergences(Can With reference to http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_d ivergence), geneva away from From(Can refer to http://en.wikipedia.org/wiki/Mahalanobis_distance), Pasteur's distance(Can refer to http://en.wikipedia.org/wiki/Bhattacharyya_distance).In fact, any range measurement (Calculate)Method can be used in the present invention, that is to say, that the present invention for for computation model inscape to The distance calculating method of distance is not intended to be limited in any.
In step S203, using the distributed intelligence of the modeling unit obtained in step S201 come for being obtained in step S202 Model inscape to distance be weighted, with the distance for being weighted.Specifically, with building for being obtained in step S201 The distributed intelligence of form unit as weight, come for the model inscape obtained in step S202 to distance added Power.More specifically, for example, can using as the distributed intelligence of the modeling unit of weight and the model inscape to away from From multiplication come with a distance from being weighted.
For example, it is assumed that by step S202 obtain a state of the first acoustic model and the second acoustic model to (i_320, I_314 K-L distances) are 52249.47, and wherein i_320 is the state in the first acoustic model, and i_314 is the second acoustic model In state, also, the state occupation probability of the state i_314 in the second acoustic model obtained by step S201 be 45614.92, then the state occupation probability of state i_314 is multiplied with the above-mentioned state to the distance of (i_320, i_314), Distance 45614.92 × 52249.47 after being weighted.
As described above, in the present embodiment, using the second acoustic model modeling unit distributed intelligence come for model Inscape to distance be weighted.
In another embodiment, it is possible to use the distributed intelligence of the modeling unit of the first acoustic model to be constituted for model Key element to distance be weighted.
In other embodiments, it is possible to use the distribution of the modeling unit of both the first acoustic model and the second acoustic model Information come for model inscape to distance be weighted.
For example, can be by the modeling unit of the distributed intelligence of the modeling unit of the first acoustic model and the second acoustic model Distributed intelligence is averaging, by the result of gained(Also referred to as average distributed intelligence)Come for model inscape as weight To distance be weighted.
Again for example, it is also possible to pass through with different weights to the distributed intelligence of the modeling unit of the first acoustic model and second The distributed intelligence of the modeling unit of acoustic model be weighted then summation, then using after summation gained distributed intelligence as with The weight being weighted in adjusting the distance, come for model inscape to distance be weighted.Specifically, for example, to The weight of one acoustic model distribution 0.6, to the weight of the second acoustic model distribution 0.4, with the two weights to the first acoustic mode The distributed intelligence of the modeling unit of the distributed intelligence of the modeling unit of type and the second acoustic model is weighted and then summation, then will The distributed intelligence of gained is right for model inscape as the weight being weighted for adjusting the distance after weighted sum Distance be weighted.Here, the weight distributed to the first acoustic model and the second acoustic model is merely to explain the present invention And the example enumerated, in fact, the present invention does not make for the mode to the first acoustic model and the second acoustic model distribution weight Any limitation.
Thus, in the present invention, for model inscape to distance added with the distributed intelligence of modeling unit The mode of power does not have any limitation.
Then, in step S204, the right of each above-mentioned model inscape of being sorted according to the distance after weighting, with To the first acoustic model and the nearest model inscape of the second acoustic model.Can be carried out using ascending order or descending above-mentioned Sequence, the present invention do not make special limitation to this.
Specifically, for example, among the state respectively from both the first acoustic model and the second acoustic model most Near state, by the minimum of the distance between two Gaussian Mixtures of state respectively from the two acoustic models and can come Determine the nearest state of the two acoustic models.Further, it is also possible to pass through directly to compare respectively from the first acoustic model and The Gaussian Mixture or two modeling units of two states of the second acoustic model(Such as phoneme etc.)Hidden markov mould Type(HMM)To determine the nearest state of the two acoustic models.If for example, state is nearest to (i_320, i_314) State, then for the state i_314 of the second acoustic model, it is meant that compared to other states of the first acoustic model, the The distance between state i_320 and state i_314 of one acoustic model are recently.In other words, if state is to (i_320, i_ 314) it is nearest state, then for the state i_320 of the first acoustic model, it is meant that compared to the second acoustic model The distance between other states, state i_314 and state i_320 of the second acoustic model are recently.
Then, in step S205, according to the result of sequence, the first acoustic model and the second acoustics model combination are obtained The acoustic model of merging.
Here it is possible to by the second acoustics model combination to the first acoustic model, not change the first acoustic model, so that really Protect the precision of identifying speech as the first acoustic model of primary acoustic model.It is of course also possible to the first acoustic model is merged To the second acoustic model.
In addition, as described above, it is also possible to be the first acoustic model as auxiliary acoustic model, the second acoustic model is used as master Want acoustic model.In this case, also can be that the first acoustic model is merged into the second acoustic model, or can be by Two acoustic models are merged into the first acoustic model.Therefore, in the present invention, acoustic model can be closed in an arbitrary manner And, i.e. arbitrary merging mode is suitable for the present invention, therefore, the present invention simultaneously need not limit the mode that acoustic model merges.
Preferably, in the present invention, can be replaced with the model inscape in the first acoustic model according to the result of sequence The model inscape changed in the second acoustic model and by the first acoustic model and the second acoustics model combination, so as to be merged Acoustic model.
Certainly, merge similarly with acoustic model above, it is also possible to which being constituted with the model in the second acoustic model will Usually replace model inscape in the first acoustic model and by the first acoustic model and the second acoustics model combination so that To the acoustic model for merging.
Additionally, except preferably implementing acoustic model union operation by the way of replacing, can also use other Different modes realizes acoustic model union operation.It is for instance possible to use " by the first acoustic model and the second acoustic model Parameter(I.e. above-mentioned model inscape, such as average, variance, Gaussian Mixture, state, hidden Markov model etc.)Added The mode of weight average " merges acoustic model.Specifically, the average weighted mode is:To the first acoustic model parameter and The relevant parameter distribution weight of the second acoustic model, then tries to achieve the weighted average of the parameter of the two acoustic models, and will The weighted average is used as new acoustic model(That is, the acoustic model after merging)Parameter.
Below will only in an alternative manner as a example by further describe the present invention, but those of ordinary skill in the art will be bright In vain, the present invention is not limited to such mode, and following such mode is only one of preferred embodiment.
For example, the nearest state that can be used in the first acoustic model that step S204 is obtained replaces the second acoustic model In corresponding state(That is, that state of the second closest with the state of the first acoustic model acoustic model), so Afterwards, by the first acoustic model be replaced after the second acoustic model(That is, the second acoustic model of modification)Combination, is merged Acoustic model(The acoustic model for also referred to as bundling).Thus, at least one of second acoustic model state is by the first acoustics State in model is replaced, and the state being replaced in the second acoustic model can be deleted.It is thus possible to reduce by second The size of acoustic model, it is possible thereby to reduce the size of the acoustic model of binding.
Here, depending on the quantity of the state being replaced in the second acoustic model can be according to practical application.Preferably, The quantity of the state being replaced in two acoustic models is not more than a default threshold value.If the state being replaced is more, meeting So that the size of the acoustic model of binding declines more, but may result in the hydraulic performance decline of the acoustic model of binding.Example Such as, if 950 states in the second acoustic model are replaced by the state in the first acoustic model, the second acoustic mode is meaned 950 corresponding original states in type will be deleted, then the size of the acoustic model for bundling will reduce 950.
Preferably, therefore, can be that the quantity of the state being replaced in the second acoustic model is set according to actual conditions One appropriate threshold value.
By above-mentioned first embodiment of the invention, can effectively merge acoustic model so as to reduce in appropriate amount The size of acoustic model, and the performance of acoustic model is not significantly reduced, it is enable to using resulting multilingual sound Model is learned accurately and efficiently to carry out speech recognition.Specifically, by the consideration based on the importance for modeling unit only The hardly important model inscape in acoustic model is selected and replaces to realize the present invention.
[second embodiment]
In the first embodiment, using the distributed intelligence of modeling unit, such as state occupation probability, as weight come to by The model inscape that the model inscape of one classification of acoustic model is constituted to distance be weighted.Second implements Example is differred primarily in that with first embodiment, it is possible to use the distributed intelligence of modeling unit comes to respectively by acoustics as weight The model inscape that the model inscape of at least two classifications of model is constituted to distance be weighted.
Hereinafter, reference picture 3 is described in detail the second embodiment of the present invention.
Fig. 3 is exemplarily illustrated the flow chart of acoustic model merging method according to the second embodiment of the present invention.
First, it is similar with the step S201 in first embodiment in step S301, obtain the distributed intelligence of modeling unit. Specifically, the distributed intelligence of modeling unit can be obtained based on experience or by being counted in storehouse is trained.
As with the first embodiment, the distributed intelligence of modeling unit not only can be counted and obtained from training storehouse, can be with Obtained from the data of modeling unit itself.In a word, the present invention is for being obtained in that the mode of the distributed intelligence of modeling unit is not made Any limitation, as long as it can reflect significance level of the modeling unit in the language to be recognized.
Then, in step S302, for the step S202 classes in a model inscape for classification, with first embodiment As, calculate the class model inscape to distance.Specifically, for example, the model inscape of one classification can Think state, i.e. the distance of state pair can be calculated, i.e. calculate as main acoustic model the first acoustic model state and As auxiliary acoustic model the second acoustic model state to distance.It is similar with first embodiment, here, it is preferred that ground will Each state of first acoustic model constitutes state pair with each state of the second acoustic model, then calculates each state pair Distance.
Here, as described above, the present invention for for computation model inscape to distance distance calculating method It is not intended to be limited in any.
In step S303, using the modeling list obtained in step S301 with the step S203 in first embodiment similarly The distributed intelligence of unit come model inscape for the one classification obtained in step S302 to distance added Power, with the distance for being weighted.For example, can be using as the mould of the distributed intelligence of the modeling unit of weight and one classification Type inscape to distance be multiplied the distance that is weighted.
As set forth above, it is possible to using the distributed intelligence of the modeling unit of the second acoustic model come for model inscape To distance be weighted, it is also possible to using the distributed intelligence of the modeling unit of the first acoustic model come for model inscape To distance be weighted, can also using both the first acoustic model and the second acoustic model modeling unit distribution Information(For example the distributed intelligence of the modeling unit for both be averaging or by weighted sum by way of)It is right to come In model inscape to distance be weighted.In a word, in the present invention, for model inscape to distance with The mode that the distributed intelligence of modeling unit is weighted does not have any limitation.
Then, in step S304, sorted according to the distance after weighting with the step S204 in first embodiment similarly The model inscape of each above-mentioned one classification it is right, to obtain the first acoustic model and the second acoustic model most Near model inscape.Above-mentioned sequence can be carried out using ascending order or descending, the present invention does not make special limitation to this.
Then, in step S305, by the first acoustic model and the second acoustics model combination, the acoustic model for being merged. Preferably, by the second acoustics model combination to the first acoustic model, not change the first acoustic model, so that it is guaranteed that as main The precision of identifying speech of the first acoustic model of acoustic model.
Second embodiment essentially consists in S305 the step of acoustic model merges with the difference of first embodiment.Below will be specific Ground describes step S305 to show the difference.
In step S3051, according to the class model inscape to ranking results, described in the first acoustic model Corresponding model inscape in one model inscape second acoustic model of replacement of classification, obtains the second of the first modification Acoustic model.
As described in the first embodiment, in the case where the model inscape of one classification is for state, for example may be used The corresponding state in the second acoustic model is replaced with the nearest state in the first acoustic model for being obtained used in step S304 (That is, that state of the second closest with the state of the first acoustic model acoustic model), so as to what is be replaced Second acoustic model(That is, the second acoustic model of the first modification).
It is preferred here that the quantity of the state being replaced in the second above-mentioned acoustic model be not more than one it is default Threshold value Th1.
Then, in step S3052, the mould of other classifications different for from the model inscape of one classification Type inscape, calculates the model inscape and second acoustic model of other classifications as described in the first acoustic model respectively Other class models inscapes that the model inscape of other classifications is constituted to distance.Here, it is described its He can include at least one classification at classification.For example, in the case where the model inscape of one classification is for state, institute The model inscape for stating other classifications can be hidden markov model, Gaussian Mixture, variance, and/or average etc..
It is similar with step S303 in step S3053, using the distributed intelligence of the modeling unit obtained in step S301 come right In each corresponding other classifications model inscape to distance be weighted.
It is similar with step S304 in step S3054, according to the distance after the weighting that step S3053 is carried out to sort Each for stating the model inscape of other classifications is right.
It is similar with step S3051 in step S3055, model inscape according to other classifications each to Ranking results, second is replaced with the model inscape with minimum range in other classifications described in the first acoustic model The corresponding model inscape of other classifications in acoustic model, so as to obtain the rising tone of at least one second modifications Learn model.
In step S3056, by with weight to second acoustic model and described at least one second of the described first modification Modification the second acoustic model be weighted, then by weighting after it is described first modification the second acoustic model and it is described at least One second second acoustics model combination of modification, obtains mixing the second acoustic model.
Here, the weight distributed to the second acoustic model of the first modification and the second acoustic model of the second modification for example may be used With between 0~1, but the present invention is restricted not to this.
In step S3057, the first acoustic model is closed with the second acoustics model group is mixed, the acoustic model for being merged (That is, the acoustic model of binding).
Here it is to be noted that it the execution sequence of each step is not necessarily to according to institute shown by flow chart and above As mentioning, and can be come flexible according to actual conditions, i.e. the present invention should not be subject to shown by flow chart The step of execution sequence limitation.
For example, S3051 may be located at and be repaiied for obtaining second the step of for the second acoustic model for obtaining the first modification Before the step of the second acoustic model for changing S3052~S3056, such as Fig. 3 and as described before in this manual, also may be used So that after the step S3052~S3056, this does not influence for essence of the invention.
Additionally, by step S3052~S3056, one second second acoustic model of modification can be obtained, it is also possible to To the second acoustic model of multiple second modifications, those of ordinary skill in the art understand variation therein.And, for the ease of retouching State, figure 3 illustrates be that step S3052~S3056 is performed one time and can be obtained by least one(One or more) Second acoustic model of the second modification, indeed, it is possible to often perform a step S3052~S3056 or similar step just Obtain one second second acoustic model of modification.
Here it is to be noted that it the second acoustic model of each the second modification corresponds to different classes of model constituting Key element.
Additionally, the rising tone of the second acoustic model and at least one second modifications in step S3056 to the first modification The quantity that model is weighted used weight is corresponding with the quantity of the second acoustic model of the second modification.That is, in step In the case that S3055 obtains one second second acoustic model of modification, then used power is weighted in step S3056 Weight is 2(That is, the quantity+1 of the second acoustic model of the second modification).Two second the second of modification are obtained in step S3055 In the case of acoustic model, then used weight is weighted in step S3056 for 3.
In the present embodiment, by using different classes of model inscape, compared in first embodiment using only One model inscape of classification, can obtain more accurate voice identification result.
[3rd embodiment]
In the first and second embodiment, the first acoustic model and the second acoustic model are merged and is bundled Acoustic model.3rd embodiment differs primarily in that with the first and second embodiments, can be by more than two acoustic model The acoustic model for merging and being bundled.
Hereinafter, reference picture 4 is described in detail the third embodiment of the present invention.
Fig. 4 is exemplarily illustrated the flow chart of acoustic model merging method according to the third embodiment of the invention.
Step S401~S405 in Fig. 4 can be similar with the S201~S205 in first embodiment, it is also possible to second S301~S305 in embodiment is similar to.
In step S406, other acoustic models different for from the first acoustic model and the second acoustic model still can be with Merge.Can be more than in the first embodiment of the present invention or second embodiment with merging for other acoustic models Described in method, it would however also be possible to employ other methods, such as conventional in the prior art method, the present invention is not limited this System.
In addition, described other acoustic models can include at least one acoustic model.It is many in described other acoustic models In the case of individual acoustic model, it can be merged with the acoustic model for merging obtained in step S405 one by one.This hair It is bright that this is also not intended to be limited in any.
Additionally, in the case where more than two acoustic model is merged, the mode that acoustic model merges also with two The situation that acoustic model is merged is similar to, i.e. as described above, the present invention does not make any limit for the mode that acoustic model merges System, i.e. arbitrary merging mode is suitable for the present invention.
[fourth embodiment]
Describe to be used to merge according to an embodiment of the invention hereinafter with reference to Fig. 5~Fig. 7 to include the first acoustic model Acoustic model with multiple acoustic models of the second acoustic model merges the exemplary configuration of equipment 1000.
Fig. 5 is the schematic frame for showing the exemplary configuration of acoustic model merging equipment according to an embodiment of the invention Figure.Fig. 6 is that the exemplary configuration for showing combining unit in acoustic model merging equipment according to an embodiment of the invention is shown Meaning property block diagram.Fig. 7 is another example for showing the combining unit in acoustic model merging equipment according to an embodiment of the invention Property configuration schematic block diagram.
Acoustic model merges equipment 1000 and can include according to an embodiment of the invention:Distributed intelligence acquiring unit 1001, it is configured as obtaining the distributed intelligence of the modeling unit of at least first and/or second acoustic model, wherein, the distribution Information can reflect significance level of the modeling unit in the language to be recognized;Metrics calculation unit 1002, is configured as The class model structure being made up of the of a sort model inscape of the first acoustic model and the second acoustic model is calculated respectively Into key element to distance;Weighted units 1003, are configured to, with the distributed intelligence and come for each corresponding class model Inscape to distance be weighted;Sequencing unit 1004, distance after being configured as according to weighting sort it is each such Model inscape it is right;And combining unit 1005, the result according to sequence is configured as, by the first acoustic model and second Acoustic model merges, the acoustic model for being merged.
Wherein, the weighted units 1003 can in the following way in any one perform the weighting:
By the distributed intelligence of the modeling unit of the first acoustic model and the corresponding class model inscape to Distance is multiplied;
By the distributed intelligence of the modeling unit of the second acoustic model and the corresponding class model inscape to Distance is multiplied;
The distributed intelligence of the first acoustic model and the modeling unit of the second acoustic model is averaging, so as to obtain the first sound Learn the average value of the distributed intelligence of the modeling unit of model and the second acoustic model, and by the average value and corresponding such mould Type inscape to distance be multiplied;With
Distributed intelligence with predetermined different weight to the first acoustic model and the modeling unit of the second acoustic model is entered Row weighting and summation, so as to obtain the weighted sum of the distributed intelligence of the modeling unit of the first acoustic model and the second acoustic model, And by the weighted sum and the corresponding class model inscape to distance be multiplied.
In addition, as shown in fig. 6, combining unit 1005 described according to an embodiment of the invention can include:Replace part 10051, be configured as according to sequence result, with the class model inscape in the first acoustic model with most narrow spacing From model inscape replace the second acoustic model in the class model inscape in corresponding model inscape, with To the second acoustic model of the first modification;And combiner 10052, it is configured as change the first acoustic model and first Second acoustics model group is closed, the acoustic model for being merged.
Alternatively, as shown in fig. 7, combining unit described according to an embodiment of the invention can include:First replaces Part 10051 ', according to the result of sequence, with the class model inscape of the first acoustic model with minimum range Model inscape replaces the corresponding model inscape in the class model inscape of the second acoustic model, to obtain first Second acoustic model of modification;Second distance calculating unit 10052 ', other classes different for from the class model inscape Model inscape, the model inscape of other classes as described in the first acoustic model and the second acoustic model is calculated respectively Other class model inscapes for being constituted to distance, wherein described other classes include at least one classification;Second weighting Part 10053 ', using the distributed intelligence come model inscape for each corresponding other classes to distance added Power;Second ordering element 10054 ', sorted according to the distance after weighting described in other classes model inscape each is right; Second replaces part 10055 ', model inscape according to other classes each to ranking results, use the first acoustics Model inscape with minimum range in described other classes in model replace in the second acoustic model described in other The corresponding model inscape of class, so as to obtain the second acoustic model of at least one second modifications;And mixed weighting portion Part 10056 ', by with weight to described first modification the second acoustic model and it is described at least one second modification the rising tones Learn model be weighted, then by weighting after it is described first modification the second acoustic model and it is described at least one second modification The second acoustics model combination, obtain mix the second acoustic model;And combiner 10057 ', by the first acoustic model with it is mixed Close the conjunction of the second acoustics model group, the acoustic model for being merged.
In addition, the combining unit can also be configured to by the acoustic model of the merging and except first Merged with the acoustic model outside the second acoustic model.
Equipment is merged by above-mentioned acoustic model according to an embodiment of the invention, can effectively merge acoustic model So as to reduce the size of acoustic model in appropriate amount, and the performance of acoustic model is not significantly reduced, be enable to utilize institute The acoustic model of the merging for obtaining accurately and efficiently carries out speech recognition.
[the 5th embodiment]
Speech recognition system 10 according to an embodiment of the invention are described hereinafter with reference to Fig. 8.
Fig. 8 shows the schematic block diagram of the configuration of the speech recognition system of exemplary embodiment of the invention.
The speech recognition system 10 of embodiments of the invention can include that acoustic model of the invention merges equipment 1000。
In addition, audio recognition method can be using by acoustic model of the invention according to an embodiment of the invention The acoustic model that merging method is obtained carries out speech recognition.
By above-mentioned speech recognition system according to an embodiment of the invention and audio recognition method, can effectively close And acoustic model so that reduce the size of acoustic model in appropriate amount, and the performance of acoustic model is not significantly reduced, so that Speech recognition can be accurately and efficiently carried out using the acoustic model of resulting merging.
In addition, present invention could apply to include the various electronic equipments of speech recognition system, the electronic equipment is for example Including but not limited to, audio frequency apparatus(mp3、mp4), video equipment, panel computer, computer, PDA, mobile phone etc..
Further, it is noted that acoustic model merging method of the invention and acoustics model combination can be implemented in numerous ways Equipment.For example, can be implemented by software, hardware, firmware or its any combinations acoustic model merging method of the invention and Acoustic model merges equipment.The order of above-mentioned method and step is only exemplary, and method of the present invention step is not limited to the above The order of specific descriptions, unless otherwise clearly stated.Additionally, in certain embodiments, the present invention can also be carried out It is record program in the recording medium, it includes the machine readable instructions for realizing the method according to the invention.Thus, this Invention also covers storage for realizing the recording medium of the program of the method according to the invention.
Although passed through example illustrates some specific embodiments of the invention, those of ordinary skill in the art in detail It should be appreciated that above-mentioned example is intended merely to be exemplary rather than limiting the scope of the present invention.Those of ordinary skill in the art should Understand, above-described embodiment can be changed without deviating from the scope of the present invention and essence.The scope of the present invention is by appended Claim is limited.

Claims (20)

1. a kind of acoustic model merging method, multiple acoustic modes of the first acoustic model and the second acoustic model are included for merging Type, including:
Distributed intelligence obtaining step, obtains the distributed intelligence of the modeling unit of at least first and/or second acoustic model, wherein, The distributed intelligence can reflect significance level of the modeling unit in the language to be recognized;
Apart from calculation procedure, the of a sort model inscape institute by the first acoustic model and the second acoustic model is calculated respectively Composition the class model inscape to distance;
Characterized in that, the acoustic model merging method also includes:
Weighting step, using the distributed intelligence come for each corresponding class model inscape to distance added Power;
Sequence step, each class model inscape of being sorted according to the distance after weighting it is right;And
Combining step, according to the result of sequence, by the first acoustic model and the second acoustics model combination, the acoustic mode for being merged Type.
2. acoustic model merging method according to claim 1, wherein, the model inscape is average, variance, height At least one in this mixing, state, hidden Markov model.
3. acoustic model merging method according to claim 1, wherein, the distributed intelligence of the modeling unit is the modeling Frequency or duration that unit occurs in the training storehouse of corresponding acoustic model.
4. acoustic model merging method according to claim 1, wherein, the distance is Euclidean distance, K-L divergences, horse One of family name's distance and Pasteur's distance.
5. according to the acoustic model merging method that any one of Claims 1 to 4 is described, wherein, the weighting can pass through Any one in following manner is performed:
By the distributed intelligence of the modeling unit of the first acoustic model and the corresponding class model inscape to distance It is multiplied;
By the distributed intelligence of the modeling unit of the second acoustic model and the corresponding class model inscape to distance It is multiplied;
The distributed intelligence of the first acoustic model and the modeling unit of the second acoustic model is averaging, so as to obtain the first acoustic mode The average value of the distributed intelligence of the modeling unit of type and the second acoustic model, and by the average value and the corresponding class model structure Into key element to distance be multiplied;With
Distributed intelligence with predetermined different weight to the first acoustic model and the modeling unit of the second acoustic model is added Power and summation, so as to obtain the weighted sum of the distributed intelligence of the modeling unit of the first acoustic model and the second acoustic model, and will The weighted sum and the corresponding class model inscape to distance be multiplied.
6. acoustic model merging method according to claim 1, wherein, the combining step includes:
Replacement step, according to sequence result, with the class model inscape of the first acoustic model with minimum range Model inscape replace corresponding model inscape in the class model inscape of the second acoustic model, to obtain the Second acoustic model of one modification;
Combination step, the first acoustic model and the first the second acoustics model group changed is closed, the acoustic model for being merged.
7. acoustic model merging method according to claim 1, the combining step includes:
First replacement step, according to sequence result, with the class model inscape of the first acoustic model with minimum The model inscape of distance replaces the corresponding model inscape in the class model inscape of the second acoustic model, with To the second acoustic model of the first modification;
Second distance calculation procedure, the model inscape of other classes different for from the class model inscape, counts respectively Calculate by the first acoustic model and other classes described in the second acoustic model other class model structures for being constituted of model inscape Into key element to distance, wherein described other classes include at least one classification;
The second weighting step, using the distributed intelligence come the model inscape for each corresponding other classes to distance It is weighted;
Second sequence step, sorted according to the distance after weighting described in other classes model inscape each is right;
Second replacement step, model inscape according to other classes each to ranking results, use the first acoustic mode Other classes described in the second acoustic model of replacement of the model inscape with minimum range in described other classes in type Corresponding model inscape, so as to obtain at least one second modification the second acoustic models;And
Mixed weighting step, by the second acoustic model with weight to the described first modification and at least one second modification The second acoustic model be weighted, then by weighting after it is described first modification the second acoustic model and described at least one Second acoustics model combination of the second modification, obtains mixing the second acoustic model;
Combination step, the first acoustic model is closed with the second acoustics model group is mixed, the acoustic model for being merged.
8. the acoustic model merging method according to claim 6 or 7, also includes:By the acoustic model of the merging with remove Acoustic model outside first and second acoustic models is merged.
9. the acoustic model merging method according to claim 6 or 7, wherein, the replacement of model inscape proceed to by Untill the quantity of the model inscape of replacement reaches corresponding predetermined threshold value.
10. a kind of audio recognition method, it is characterised in that including:
Enter using according to the acoustic model obtained by the described acoustic model merging method of any one of claim 1~9 Row speech recognition.
A kind of 11. acoustic models merge equipment, and multiple acoustics of the first acoustic model and the second acoustic model are included for merging Model, including:
Distributed intelligence acquiring unit, is configured as obtaining the distribution letter of the modeling unit of at least first and/or second acoustic model Breath, wherein, the distributed intelligence can reflect significance level of the modeling unit in the language to be recognized;
Metrics calculation unit, is configured to calculate by the first acoustic model and the of a sort model structure of the second acoustic model The class model inscape constituted into key element to distance;
Characterized in that, the acoustic model merges equipment also including:
Weighted units, be configured to, with the distributed intelligence come for each corresponding class model inscape to distance It is weighted;
Sequencing unit, distance after being configured as according to weighting is right come each class model inscape of sorting;And
Combining unit, is configured as the result according to sequence, and the first acoustic model and the second acoustics model combination are merged Acoustic model.
12. acoustic models according to claim 11 merge equipment, wherein, the model inscape be average, variance, At least one in Gaussian Mixture, state, hidden Markov model.
13. acoustic models according to claim 11 merge equipment, wherein, the distributed intelligence of the modeling unit is built for this Frequency or duration that form unit occurs in the training storehouse of corresponding acoustic model.
14. acoustic models according to claim 11 merge equipment, wherein, the distance be Euclidean distance, K-L divergences, One of mahalanobis distance and Pasteur's distance.
15. merge equipment according to the described acoustic model of any one of claim 11~14, wherein, the weighted units are entered Any one during one step is configured in the following way performs the weighting:
By the distributed intelligence of the modeling unit of the first acoustic model and the corresponding class model inscape to distance It is multiplied;
By the distributed intelligence of the modeling unit of the second acoustic model and the corresponding class model inscape to distance It is multiplied;
The distributed intelligence of the first acoustic model and the modeling unit of the second acoustic model is averaging, so as to obtain the first acoustic mode The average value of the distributed intelligence of the modeling unit of type and the second acoustic model, and by the average value and the corresponding class model structure Into key element to distance be multiplied;With
Distributed intelligence with predetermined different weight to the first acoustic model and the modeling unit of the second acoustic model is added Power and summation, so as to obtain the weighted sum of the distributed intelligence of the modeling unit of the first acoustic model and the second acoustic model, and will The weighted sum and the corresponding class model inscape to distance be multiplied.
16. acoustic models according to claim 11 merge equipment, wherein, the combining unit includes:
Part is replaced, the result according to sequence is configured as, with having in the class model inscape of the first acoustic model The model inscape of minimum range replaces the corresponding model inscape in the class model inscape of the second acoustic model, To obtain the second acoustic model of the first modification;
Combiner, is configured as closing the first acoustic model and the first the second acoustics model group changed, the sound for being merged Learn model.
17. acoustic models according to claim 11 merge equipment, and the combining unit includes:
First replace part, according to sequence result, with the class model inscape of the first acoustic model with minimum The model inscape of distance replaces the corresponding model inscape in the class model inscape of the second acoustic model, with To the second acoustic model of the first modification;
Second distance calculating unit, the model inscape of other classes different for from the class model inscape, counts respectively Calculate by the first acoustic model and other classes described in the second acoustic model other class model structures for being constituted of model inscape Into key element to distance, wherein described other classes include at least one classification;
Second weighting block, using the distributed intelligence come model inscape for each corresponding other classes to distance It is weighted;
Second ordering element, sorted according to the distance after weighting described in other classes model inscape each is right;
Second replaces part, model inscape according to other classes each to ranking results, use the first acoustic mode Other classes described in the second acoustic model of replacement of the model inscape with minimum range in described other classes in type Corresponding model inscape, so as to obtain at least one second modification the second acoustic models;And
Mixed weighting part, by the second acoustic model with weight to the described first modification and at least one second modification The second acoustic model be weighted, then by weighting after it is described first modification the second acoustic model and described at least one Second acoustics model combination of the second modification, obtains mixing the second acoustic model;
Combiner, the first acoustic model is closed with the second acoustics model group is mixed, the acoustic model for being merged.
18. acoustic model according to claim 16 or 17 merges equipment, and the combining unit is configured to:Will The acoustic model of the merging is merged with the acoustic model in addition to the first and second acoustic models.
19. acoustic model according to claim 16 or 17 merges equipment, wherein, the replacement of model inscape proceeds to Untill the quantity of the model inscape being replaced reaches corresponding predetermined threshold value.
20. a kind of speech recognition systems, it is characterised in that including the described acoustic model of any one of claim 11~19 Merging equipment.
CN201310182399.5A 2013-05-17 2013-05-17 Acoustic model merging method and equipment and audio recognition method and system Active CN104167206B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310182399.5A CN104167206B (en) 2013-05-17 2013-05-17 Acoustic model merging method and equipment and audio recognition method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310182399.5A CN104167206B (en) 2013-05-17 2013-05-17 Acoustic model merging method and equipment and audio recognition method and system

Publications (2)

Publication Number Publication Date
CN104167206A CN104167206A (en) 2014-11-26
CN104167206B true CN104167206B (en) 2017-05-31

Family

ID=51910987

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310182399.5A Active CN104167206B (en) 2013-05-17 2013-05-17 Acoustic model merging method and equipment and audio recognition method and system

Country Status (1)

Country Link
CN (1) CN104167206B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106971735B (en) * 2016-01-14 2019-12-03 芋头科技(杭州)有限公司 A kind of method and system regularly updating the Application on Voiceprint Recognition of training sentence in caching
CN108305619B (en) * 2017-03-10 2020-08-04 腾讯科技(深圳)有限公司 Voice data set training method and device
CN110517679B (en) * 2018-11-15 2022-03-08 腾讯科技(深圳)有限公司 Artificial intelligence audio data processing method and device and storage medium
WO2020109427A2 (en) * 2018-11-29 2020-06-04 Bp Exploration Operating Company Limited Event detection using das features with machine learning
CN109559749B (en) * 2018-12-24 2021-06-18 思必驰科技股份有限公司 Joint decoding method and system for voice recognition system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6167377A (en) * 1997-03-28 2000-12-26 Dragon Systems, Inc. Speech recognition language models
US6868381B1 (en) * 1999-12-21 2005-03-15 Nortel Networks Limited Method and apparatus providing hypothesis driven speech modelling for use in speech recognition
CN101114449A (en) * 2006-07-26 2008-01-30 大连三曦智能科技有限公司 Model training method for unspecified person alone word, recognition system and recognition method
CN101727901A (en) * 2009-12-10 2010-06-09 清华大学 Method for recognizing Chinese-English bilingual voice of embedded system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2003283742A1 (en) * 2002-12-16 2004-07-09 Koninklijke Philips Electronics N.V. Method of creating an acoustic model for a speech recognition system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6167377A (en) * 1997-03-28 2000-12-26 Dragon Systems, Inc. Speech recognition language models
US6868381B1 (en) * 1999-12-21 2005-03-15 Nortel Networks Limited Method and apparatus providing hypothesis driven speech modelling for use in speech recognition
CN101114449A (en) * 2006-07-26 2008-01-30 大连三曦智能科技有限公司 Model training method for unspecified person alone word, recognition system and recognition method
CN101727901A (en) * 2009-12-10 2010-06-09 清华大学 Method for recognizing Chinese-English bilingual voice of embedded system

Also Published As

Publication number Publication date
CN104167206A (en) 2014-11-26

Similar Documents

Publication Publication Date Title
CN104835498B (en) Method for recognizing sound-groove based on polymorphic type assemblage characteristic parameter
CN108615525B (en) Voice recognition method and device
CN104167206B (en) Acoustic model merging method and equipment and audio recognition method and system
KR101323061B1 (en) Speaker authentication
CN103578471B (en) Speech identifying method and its electronic installation
US10452352B2 (en) Voice interaction apparatus, its processing method, and program
CN108711421A (en) A kind of voice recognition acoustic model method for building up and device and electronic equipment
JP6440967B2 (en) End-of-sentence estimation apparatus, method and program thereof
CN108922521A (en) A kind of voice keyword retrieval method, apparatus, equipment and storage medium
JP2016075740A (en) Voice processing device, voice processing method, and program
CN108877769B (en) Method and device for identifying dialect type
CN102915729A (en) Speech keyword spotting system and system and method of creating dictionary for the speech keyword spotting system
CN111091809B (en) Regional accent recognition method and device based on depth feature fusion
Nagano et al. Data augmentation based on vowel stretch for improving children's speech recognition
KR102415519B1 (en) Computing Detection Device for AI Voice
JP6786065B2 (en) Voice rating device, voice rating method, teacher change information production method, and program
Chakravarthula et al. Assessing empathy using static and dynamic behavior models based on therapist's language in addiction counseling
Mysore et al. A non-negative approach to language informed speech separation
JP5391150B2 (en) Acoustic model learning label creating apparatus, method and program thereof
US9355636B1 (en) Selective speech recognition scoring using articulatory features
Larcher et al. Constrained temporal structure for text-dependent speaker verification
KR102389995B1 (en) Method for generating spontaneous speech, and computer program recorded on record-medium for executing method therefor
KR102408455B1 (en) Voice data synthesis method for speech recognition learning, and computer program recorded on record-medium for executing method therefor
Yanagisawa et al. Noise robustness in HMM-TTS speaker adaptation
KR102395399B1 (en) Voice data disassemble method for speech recognition learning, and computer program recorded on record-medium for executing method therefor

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant