CN108389577B - Optimize method, system, equipment and the storage medium of voice recognition acoustic model - Google Patents
Optimize method, system, equipment and the storage medium of voice recognition acoustic model Download PDFInfo
- Publication number
- CN108389577B CN108389577B CN201810146221.8A CN201810146221A CN108389577B CN 108389577 B CN108389577 B CN 108389577B CN 201810146221 A CN201810146221 A CN 201810146221A CN 108389577 B CN108389577 B CN 108389577B
- Authority
- CN
- China
- Prior art keywords
- text
- mark
- error
- information
- sample voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0635—Training updating or merging of old and new templates; Mean values; Weighting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0635—Training updating or merging of old and new templates; Mean values; Weighting
- G10L2015/0636—Threshold criteria for the updating
Abstract
The embodiment of the invention discloses method, system, equipment and the storage mediums of optimization voice recognition acoustic model.This method comprises: obtaining the mark text of sample voice, and obtain the identification text that sample voice is obtained based on current acoustic model;Mark text and the identification text are compared, and determines the error label information of the relatively described identification text of mark text when comparison result is to mismatch;Decision condition is updated according to the corresponding text of the error label information, updates the mark text of the sample voice;Sample voice and current corresponding mark text, re -training based on set amount optimize the current acoustic model.Using this method, the mark quality of the corresponding mark text of sample voice can be effectively improved, to achieve the purpose that optimize acoustic model.
Description
Technical field
The present invention relates to field of computer technology, more particularly to method, system, the equipment of optimization voice recognition acoustic model
And storage medium.
Background technique
With speech recognition can application range continuous expansion, speech recognition technology has become an emerging high-tech and produces
Industry, and obtain the concern of more technical staff.Currently, one of the important composition in speech recognition system is exactly acoustic model, sound
The quality for learning model has been largely fixed the superiority and inferiority of speech recognition result, and therefore, it is necessary to constantly to speech recognition acoustic mode
Type optimizes.
Generally, a large amount of sample data is needed to support the training of acoustic model, and sample data frequently includes voice
Data and mark text (word content that voice data includes) corresponding to voice data.Mark text is typically based on a large amount of people
Work mark is realized or is obtained by the identification of third party's identifying system, but obtains mark text by the above method and often exist centainly
Mistake influences to mark quality.
For voice recognition acoustic model, the mark quality for promoting mark text, which is equivalent to, carries out acoustic model optimization
One of means, but at present not yet find by promoted mark text quality come realize acoustic model optimization technical side
Case.
Summary of the invention
The embodiment of the invention provides method, system, equipment and the storage mediums of optimization voice recognition acoustic model, can
The promotion of mark text marking quality is realized, to achieve the purpose that optimize acoustic model.
In a first aspect, the embodiment of the invention provides a kind of methods for optimizing voice recognition acoustic model, comprising:
The mark text of sample voice is obtained, and obtains the identification text that the sample voice is obtained based on current acoustic model
This;
The mark text and the identification text are compared, and determines the mark text when comparison result is to mismatch
The error label information of the relatively described identification text;
Decision condition is updated according to the corresponding text of the error label information, updates the mark text of the sample voice
This;
Sample voice and current corresponding mark text, re -training based on set amount optimize the current acoustic
Model.
Second aspect, the embodiment of the invention provides a kind of devices for optimizing voice recognition acoustic model, comprising:
Text obtains module, for obtaining the mark text of sample voice, and obtains the sample voice and is based on current sound
Learn the identification text that model obtains;
Error label determining module is not for comparing the mark text and the identification text, and in comparison result
The error label information of the relatively described identification text of the mark text is determined when matching;
Text update module is marked, for updating decision condition according to the corresponding text of the error label information, is updated
The mark text of the sample voice;
Acoustic model optimization module, for based on set amount sample voice and current corresponding mark text, weight
New training optimizes the current acoustic model.
The third aspect, the embodiment of the invention provides a kind of computer equipments, comprising:
One or more processors;
Storage device, for storing one or more programs;
One or more of programs are executed by one or more of processors, so that one or more of processors
The method that the optimization voice recognition acoustic model provided such as above-mentioned first aspect embodiment is provided.
Fourth aspect, the embodiment of the invention provides a kind of computer readable storage mediums, are stored thereon with computer journey
Sequence realizes the side of the optimization voice recognition acoustic model provided such as above-mentioned first aspect embodiment when the program is executed by processor
Method.
In the method for above-mentioned optimization voice recognition acoustic model, system, equipment and storage medium, sample language is obtained first
The mark text of sound, and obtain the identification text that sample voice is obtained based on current acoustic model;Then compare mark text and
It identifies text, and determines the error label information of the opposite identification text of mark text when comparison result is to mismatch;Root later
According to error label information and sample voice respectively in mark text and the pronunciation probability under identification text, the mark of sample voice is updated
Explanatory notes sheet;The sample voice and current corresponding mark text, re -training for being based ultimately upon set amount optimize current acoustic
Model.Using this method, the mark quality of the corresponding mark text of sample voice can be effectively improved, to improve acoustic mode
The quality of training data needed for type, and then achieved the purpose that optimize acoustic model, speech recognition is improved to a certain extent
Accuracy rate.
Detailed description of the invention
Fig. 1 is a kind of flow diagram of the method for optimization voice recognition acoustic model that the embodiment of the present invention one provides;
Fig. 2 is a kind of flow diagram of method for optimizing voice recognition acoustic model provided by Embodiment 2 of the present invention;
Fig. 3 is a kind of structural block diagram of the device for optimization voice recognition acoustic model that the embodiment of the present invention three provides;
Fig. 4 is a kind of hardware structural diagram for computer equipment that the embodiment of the present invention four provides.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to just
Only the parts related to the present invention are shown in description, attached drawing rather than entire infrastructure.
Embodiment one
Fig. 1 is a kind of flow diagram of the method for optimization voice recognition acoustic model that the embodiment of the present invention one provides.
This method is suitable for the case where optimizing promotion to the acoustic model for speech recognition, and this method can be known by optimization voice
The device of other acoustic model executes, which can be by hardware and/or software realization, and is typically integrated in and has speech recognition function
In the computer equipment of energy.
As shown in Figure 1, a kind of method for optimization voice recognition acoustic model that the embodiment of the present invention one provides, including it is as follows
Operation:
S101, the mark text for obtaining sample voice, and obtain the identification that sample voice is obtained based on current acoustic model
Text.
It is understood that the sample voice is equivalent to one that voice data needed for carrying out acoustic training model is concentrated
Voice data, meanwhile, when carrying out acoustic training model, every sample voice is all corresponding, and there are a mark texts.It is described current
Acoustic model specifically can be regarded as the sample voice concentrated by voice data and its current corresponding mark sample training obtains
Acoustic model.
The available voice data of this step concentrates the mark text of a sample voice, and it is logical to obtain the sample voice
Cross corresponding identification text after speech recognition system.Wherein, it is believed that current acoustic model is contained in the speech recognition system,
Its speech recognition is especially by current acoustic model realization.
S102, mark text and identification text are compared, and determines that mark text is opposite when comparison result is to mismatch and knows
The error label information of other text.
It in the present embodiment, can be by the mark text of sample voice and identification after the identification text for obtaining sample voice
Text is compared, to determine whether the text for including in two texts matches one by one, if text included in two texts is equal
It matches one by one, then can determine that the comparison result of two texts is text matches, if text included in two texts can not be one by one
Matching can determine the comparison result of two texts then to mismatch.For unmatched mark text and identification text, it can recognize
To exist in mark text and identifying the unmatched text of text, at this time it is believed that there are the texts of marking error in mark text
Word it is also contemplated that there is the text of identification mistake in identification text, or is also believed that in two texts and there is wrong text.It needs
It is noted that this step herein it is not intended that have because which kind of above-mentioned situation cause two texts text mismatch, and
It is to directly determine in mark text with the identification unmatched text of text, may further determine that out the corresponding mismatch of each mismatch text
Information (such as position in mark text and affiliated mismatch type), can finally believe each mismatch for mismatching text
Breath carries out being aggregated to form error label information of the mark text relative to identification text.
S103, decision condition is updated according to the corresponding text of error label information, updates the mark text of sample voice.
In the present embodiment, the text update decision condition specifically can be regarded as determine how to mark text into
The determination decisions rule that row updates.Mark text is diversified compared with there is the unmatched form of text in identification text, e.g., out
Now there are diversity for the sum of mismatch text, and there is also diversity for the mismatch type of appearance mismatch text, different as a result,
The substantive content of the error label information of the opposite identification text of mark text of sample voice is just more diversified.
The present embodiment can update decision condition in advance for the corresponding text of various forms of error label information settings, this
Step can determine that corresponding text updates decision condition according to the substantive content of error label information, then according to text
The corresponding replacement criteria of decision condition is updated to realize the update of sample voice mark text.It should be noted that the present embodiment
The update mode of mark text can be, decision condition is updated according to determining text, the identification text currently obtained is selected to make
For new mark text, or continue to select original mark text as new mark text.
S104, the sample voice based on set amount and current corresponding mark text, re -training optimize current sound
Learn model.
Based on the operation of S101 to S103, the increased quality that this paper is marked to sample data may be implemented, it is possible to understand that
It is that before the operation for carrying out this step, every sample voice that the present embodiment can concentrate training data all uses above-mentioned
The update that step is labeled sample is promoted, and specifically, determines each sample voice while above-mentioned steps can be used arranged side by side
It identifies text, then filters out identification text and the unmatched sample voice further progress mark sample of corresponding mark text again
Update, in addition it is also possible to which the serial update for being successively labeled text using above-mentioned steps, the present embodiment are not realized it
Form is specifically limited.
(it can specifically regard completion mark text as to mention obtaining the current corresponding mark text of sample voice based on above-mentioned steps
Mark text after rising) after, acoustic model can be trained again according to sample voice and its current corresponding mark text,
Optimization obtains new current acoustic model.Set amount in this step specifically can be regarded as training data and concentrate the sample language for including
Sound total quantity.It is understood that the method for optimization voice recognition acoustic model provided in this embodiment is equivalent to a circulation
The method of realization, can again return to the operation that S101 restarts next round after having carried out successively operation, and circulation terminates
The circulation that condition can be artificial settings terminates number, and the present embodiment is believed that based on the current acoustic model after optimization retraining
The recognition accuracy of speech recognition system can be promoted to a certain extent.
A kind of method for optimization voice recognition acoustic model that the embodiment of the present invention one provides obtains sample voice first
Text is marked, and obtains the identification text that sample voice is obtained based on current acoustic model;Then mark text and identification are compared
Text, and the opposite error label information for identifying text of mark text is determined when comparison result is to mismatch;Later according to mistake
Accidentally the pronunciation probability in mark text and under identifying text, the mark for updating sample voice are literary respectively for markup information and sample voice
This;The sample voice and current corresponding mark text, re -training for being based ultimately upon set amount optimize current acoustic model.
Using this method, the mark quality of the corresponding mark text of sample voice can be effectively improved, to improve acoustic model institute
The quality of training data is needed, and then has achieved the purpose that optimize acoustic model, it is a degree of to improve the accurate of speech recognition
Rate.
Embodiment two
Fig. 2 is a kind of flow diagram of method for optimizing voice recognition acoustic model provided by Embodiment 2 of the present invention.
The embodiment of the present invention is optimized based on above-described embodiment, in the present embodiment, will further compare the mark text
With the identification text, and comparison result be mismatch when determine it is described mark text relatively it is described identification text mistake mark
Information is infused, is embodied as: comparing the mark text and identification text, obtains the volume of the mark text and the identification text
Distance is collected, and when the editing distance is non-zero, determines comparison result to mismatch;When the comparison result is to mismatch,
According to the editing distance, the error label sum of the relatively described identification text of the mark text, the institute of error label are determined
In position and the type of error of each error label;By the position of error label sum and each error label and
Affiliated type of error is denoted as the error label information.
Meanwhile decision condition will be updated according to the corresponding text of the error label information, update the sample voice
Text is marked, is embodied as: based on the error label information, the sample language is searched in preset Multi-level information relation table
The corresponding text of sound updates decision condition, wherein it is that sample voice is pronounced in the case where marking text that the text, which updates decision condition,
Probabilistic information and sample voice pronounce in the case where identifying text probabilistic information judgement compared with;Determine the sample voice and the mark
Infuse text justification after first pronunciation probabilistic information, and with it is described identification text justification after second pronunciation probabilistic information;When
When determining that the text updates decision condition establishment based on the first pronunciation probabilistic information and the second pronunciation probabilistic information,
The identification text is determined as to the new mark text of the sample voice;Otherwise, continue the mark text as described in
The new mark text of sample voice.
Specifically, it is provided by Embodiment 2 of the present invention it is a kind of optimize voice recognition acoustic model method, specifically include as
Lower operation:
S201, the mark text for obtaining sample voice, and obtain the identification that sample voice is obtained based on current acoustic model
Text.
Illustratively, voice data can be directly acquired and concentrate the corresponding mark text of sample voice, further, it is also possible to logical
It crosses the speech recognition system comprising current acoustic model to be decoded sample voice, then realizes the phonetic feature of sample voice
It extracts, the identification of sample voice is realized hereby based on the phonetic feature of extraction, obtains the identification text of sample voice.
S202, mark text and identification text are compared, obtain mark text and identifies the editing distance of text, and edited
When distance is non-zero, comparison result is determined to mismatch.
Illustratively, the present embodiment following S202 to S204 give text compare and error label information determine it is specific
Operation, firstly, this step especially by editing distance algorithm by calculate mark text and identify two word string of text editor away from
From come carry out two texts comparison matching.There is one to be converted into separately between two word strings it is understood that editing distance refers to
One required minimum edit operation times, wherein the edit operation that can be carried out includes that a character is substituted for another word
Symbol is inserted into a character or deletes a character.
When matching is compared to two texts based on editing distance in this step, the operation specifically carried out may is that determination will
Least edit operation times (editing distance) when marking text conversion into identification text, when mark text conversion is at identification text
Edit operation times be 0 when, it is believed that the smallest edit distance of two texts be 0, i.e., it is believed that the text for including in two texts
Match;When mark text conversion at identification text minimum edit operation times be 1 when, it is believed that the editor of two texts away from
From being 1, i.e., it is believed that there are unmatched texts at one in two texts;When mark text conversion is edited at the minimum of identification text
When number of operations is 2, it is believed that the editing distance of two texts is 2, i.e., it is believed that there are unmatched text at two in two texts,
Similarly, when marking text conversion at the minimum edit operation times of text are identified greater than 2, it is believed that there are many places in two texts
Unmatched text.
Based on foregoing description, when the editing distance of two texts is non-zero, so that it may think that there are unmatched texts in two texts
Word can determine the comparison result of two texts to mismatch.
S203, when comparison result is to mismatch, according to editing distance, determine the mistake of the opposite identification text of mark text
The type of error of mark sum, the position of error label and each error label.
In the present embodiment, for two text unmatched for comparison result, it is believed that mark text and identification text
Between have differences, i.e., it is believed that the opposite identification text of mark text is there are the text of marking error, this step can be above-mentioned
When editing distance determines, determine that the opposite identification text of mark text specifically includes how many marks according to the editing distance determined
Infuse the text of mistake, moreover it is possible to determine specific position of the text of marking error in mark text, moreover it is possible to determine each mark
The type of error of mistake text, specifically, the above-mentioned editing distance value determined can directly regard the mistake having in mark text as
Sum is accidentally marked, during will mark text conversion vehicle identification text, if replaced at a text wherein
Operation, then can determine the position (position that can regard error label as) of the text, while may further determine that the text
Corresponding translation type is text replacement (type of error that can regard error label as is text replacement).
In addition, there are also texts to be inserted into (in the same position of mark text compared with identifying text for the type of error of error label
Set and lacked a text) and text insertion (compared with identify text at the same position for marking text more than text
Word), the type of error of error label can specifically be determined by the conversion operation actually carried out in conversion process, e.g., with knowledge
It is to mark text same that other text, which compares the conversion operation carried out when the same position for marking text has lacked a text,
A text insertion operation is carried out at one position;For another example, one more than compared with identify text at the same position of mark text
A text, the conversion operation carried out are that a text delete operation is carried out at same position to mark text.
S204, the position and affiliated type of error of error label sum and each error label are denoted as error label
Information.
S205, it is based on error label information, the corresponding text of sample voice is searched in preset Multi-level information relation table
Update decision condition.
The present embodiment can select different texts to update decision condition according to the difference of error label information, specifically,
The text update decision condition for meeting the error label information can be searched directly in preset Multi-level information relation table, as
The corresponding text of current sample voice updates decision condition.
Further, described to be based on the error label information, the sample is searched in preset Multi-level information relation table
The corresponding text of this voice updates decision condition, comprising:
Obtain the type of error of the error label sum in the error label information and error label;In the multistage letter
It ceases in relation table, is index with the error label sum, search wrong with the matched setting of the type of error of the error label
Accidentally type;The text that the update decision condition for corresponding to the setting type of error is determined as the sample voice is updated into decision
Condition.
It specifically, can be first with the error label in error label information when progress text update decision condition determines
Sum is index, first determines the follow-up corresponding to error label sum, is then searched in follow-up for mistake
The setting type of error that the error label type of mark matches can get the update decision item corresponding to the setting type of error
Part, the text which can be determined as sample voice by the present embodiment update decision condition.
It should be noted that it is that the sample voice probability that pronounces in the case where mark text is believed that it is practical, which to update decision condition, for the text
Cease with sample voice pronounce in the case where identifying text probabilistic information judgement compared with.Wherein, pronunciation probabilistic information is equivalent to sample
Voice is divided into a certain number of speech signal frames, and determines the pronunciation unit that aligned condition is in each speech signal frame
Afterwards, each speech signal frame of acquisition belongs to the pronunciation probability of corresponding pronunciation unit, and can have sample voice to be based on mark text
The pronunciation probabilistic information that pronunciation unit corresponding to this is formed can also have sample voice to be based on pronunciation unit corresponding to identification text
The pronunciation probabilistic information of formation.Text in the present embodiment updates practical be equivalent to of decision condition and is determined to above two form
The judgement of pronunciation probabilistic information compares.
Further, the Multi-level information relation table is constructed based on following step:
Initialization package is arranged containing primary information, second-level message arranges and the Multi-level information relation table of three-level information column;Described one
Storage setting error label sum, the setting error label sum include 1 character error, 2 character errors and multiword in grade information column
Mistake;Storage corresponds respectively to the setting type of error of 1 character error and 2 character errors in second-level message column, and sets
The information of second-level message cell corresponding to the fixed multiword mistake is sky;Storage corresponds to each institute in three-level information column
The update decision condition of setting type of error is stated, and the standard update decision condition of setting is stored in the multiword mistake and is corresponded to
Three-level information unit lattice in.
It is understood that having relied primarily on preset Multi-level information when above-mentioned carry out text update decision condition determines
Relation table, thus the determination of Multi-level information relation table is also crucial.Specifically, the building step based on above-mentioned Multi-level information relation table
Suddenly, the Multi-level information relation table of following table 1 form can be formed.
It as shown in table 1, is specially setting error label sum in primary information column therein, and the setting error label is total
Number is broadly divided into 1 character error, three kinds of situations of 2 character errors and multiword mistake, is specially setting type of error in second-level message column,
According to the method for determination of editing distance, it is known that every conversion operation of progress can have 3 kinds of translation types, and respectively text replaces
Change, text insertion and text are deleted, it follows that in only 1 character error, there are three kinds of type of errors, wrong when there are 2 words
It mistakes, then corresponds to six kinds of type of errors, when there are multiword mistake, the type of existing type of error is also more, the present embodiment
Do not consider one by one.It is specially the update decision condition for corresponding to each setting type of error in three-level information column, wherein due to multiword
The type of type of error is not considered when mistake specifically, the present embodiment be the multiword misspecification standard update decision item of appearance
Part.
1 Multi-level information relation table of table
Illustratively, this gives the corresponding update decision conditions of above-mentioned various setting type of errors, e.g., 1
Under character error, when type of error is that text is replaced, it can incite somebody to actionAndAs the particular content for updating decision condition 1_1;When type of error is text
When insertion, it can incite somebody to actionAndAs update decision condition
The particular content of 1_2;When type of error is that text is deleted, can incite somebody to actionAndAs the particular content for updating decision condition 1_3.
It should be noted that in above-mentioned each formula, p1(q1t/ot) indicate that sample voice is divided into the voice of certain amount M
After signal frame, the speech signal frame o of t frametBelong to the pronunciation unit q1 of t frame in mark texttPronunciation probability;p2(q2t/
ot) indicate t frame speech signal frame otBelong to the pronunciation unit q2 of t frame in identification texttPronunciation probability.Wherein, t
Range is the 1st frame to certain amount M, i.e., it is believed that t ∈ [1, M];t1∈ [x1, x2] indicates the error label text in mark text
Word has the corresponding start-stop frame number range of multiple pronunciation units;t1∈ [y1, y2] indicates the calibration text institute in identification text
Has the corresponding start-stop frame number range of multiple pronunciation units, wherein calibration text, which is equivalent in identification text, corresponds to mark text
The text of error label text in this.In addition, TIFor pre-set insertion threshold value, TDFor pre-set deletion threshold value, the two
Specific value can be artificial set according to historical experience value.
Meanwhile in the case where 2 character error,
1) it when the type of error of 2 character errors is respectively text replacement and text replacement, can incite somebody to action:
WithAs the specific of update decision condition 2_1
Content;
2) it when the type of error of 2 words mistake is respectively text replacement and text insertion, can incite somebody to action:
WithAs the particular content for updating decision condition 2_2;
3) it when the type of error of 2 character errors is respectively text replacement and text is deleted, can incite somebody to action:
WithUpdate the particular content of decision condition 2_3;
4) it when the type of error of 2 character errors is respectively text insertion and text insertion, can incite somebody to action:
AndAs the particular content for updating decision condition 2_4;
5) it when the type of error of 2 character errors is respectively text insertion and text is deleted, can incite somebody to action:
AndRegard the particular content for updating decision condition 2_5 as;
6) it when the type of error of 2 character errors is respectively that text is deleted with text deletion, can incite somebody to action:
AndRegard the particular content for updating decision condition 2_6 as.
It should be noted that in above-mentioned each formula, p1(q1t/ot) and p2(q2t/ot) represented by meaning retouched with above-mentioned
The meaning stated is identical,Indicate that it is corresponding to have multiple pronunciation units for the 1st error label text in mark text
Start-stop frame number range;Indicate that the 2nd error label text has multiple pronunciation units pair in mark text
The start-stop frame number range answered;Indicate that it is corresponding to have multiple pronunciation units for the 1st calibration text in identification text
Start-stop frame number range,Indicate that it is corresponding to have multiple pronunciation units for the 2nd calibration text in identification text
Start-stop frame number range, wherein the 1st calibration text and the 2nd calibration text are equivalent in identification text and correspond respectively to mark
The text of 1st error label text and the 2nd error label text in explanatory notes sheet.In addition, TIAnd TDRepresented meaning with it is upper
The meaning for stating description is identical.
In addition, in the case of multiword mistake, the present embodiment settingWithAs standard update decision condition, wherein p1(q1t/ot) and p2(q2t/
ot) represented by meaning it is equally identical as the meaning of foregoing description, k indicates k-th of text present in identification text, wherein k
Value at least more than 2;In addition,Indicate that it is corresponding to have multiple pronunciation units for k-th of text in identification text
Start-stop frame number range;TMFor preset multiword detection threshold value, specific value can be manually set, generally, to prevent from marking
The mistake of explanatory notes sheet updates, and the present embodiment is to TMSetting need by a series of test determine.
This step can accurately find out institute based on above-mentioned Multi-level information relation table and the error label information of determination
Corresponding text updates decision condition.
S206, determine sample voice and mark text justification after first pronunciation probabilistic information, and with identification text pair
The second pronunciation probabilistic information after neat.
It is understood that the text of above-mentioned determination, which updates decision condition, is specifically equivalent to sample voice in the case where marking text
Pronunciation probabilistic information and sample voice pronounce in the case where identify text probabilistic information judgement compared with, be thus the above-mentioned determination of judgement
Text updates whether decision condition is true, this step needs to further determine that sample voice and marks the first hair after text justification
The second pronunciation probabilistic information after sound probabilistic information and sample voice and mark text justification.
Further, the first pronunciation probabilistic information divides the voice of formation based on the sample voice as unit of frame
Signal frame and the first pronunciation unit sequence formed to the mark text modeling determine;The second pronunciation probabilistic information base
It is determined in each speech signal frame and the second pronunciation unit sequence formed to the identification text modeling.
Specifically, the concrete operations of the above-mentioned pronunciation of determination first probabilistic information and the second pronunciation unit probabilistic information can describe
Are as follows: sample voice is divided into the voice signal of setting frame number by the practical pronunciation duration that sample voice 1) is combined as unit of frame
Frame, and can determine that the phonetic feature that each speech signal frame has;2) the pronunciation modeling rule based on setting, can obtain pair respectively
Should in mark text the first pronunciation unit sequence, and corresponding to identification text the second pronunciation unit sequence, wherein it is above-mentioned
Composition mark text has been separately included in two pronunciation unit sequences and identifies the pronunciation unit of text;3) it is calculated using Dynamic Programming
Method can determine the first pronunciation unit being aligned respectively with each phonetic feature from the first pronunciation unit sequence, can also be from second
The second pronunciation unit being aligned respectively with each phonetic feature is determined in pronunciation unit sequence, is being determined respectively to its pronunciation unit
Afterwards, can also obtain each phonetic feature belong to corresponding first pronunciation unit first pronunciation probability and each phonetic feature belong to phase
Answer the second pronunciation probability of the second pronunciation unit;4) it may further determine that the first hair for constituting error label text in mark text after
The combination of sound unit, and can get that the first pronunciation unit combines corresponding first initial frame number and the first termination frame number (is equivalent to
The start-stop frame number range of error label text);5) it may further determine that the second pronunciation unit group for constituting and demarcating text in identification text
It closes, and can get that the second pronunciation unit combines corresponding second initial frame number and the second termination frame number (is equivalent to calibration text
Start-stop frame number range), wherein institute of each calibration text mainly according to corresponding each error label text in mark text is in place
Set determination;It 6) may finally be by each first pronunciation probability, the combination of each first pronunciation unit and corresponding first initial frame number and the
One termination frame number is determined as the first pronunciation probabilistic information;Each second pronunciation probability, each second pronunciation unit can be combined simultaneously
And corresponding second initial frame number and the second termination frame number are determined as the second pronunciation probabilistic information.
It should be noted that above-mentioned pronunciation unit combination specifically may be interpreted as: determining that pronunciation is single by initial consonant and simple or compound vowel of a Chinese syllable
When first, for a text " in " for, it is known that form the text pronunciation unit include " zh " and " ong " two, thus may be used
Think " in " the corresponding pronunciation unit group of word is combined into " zh " and " ong ", but in actually pronunciation, " zh " and " ong " may be occupied
The tone period of multiframe has corresponded to the initial frame number of pronunciation and has terminated frame number when thus pronunciation unit combination is pronounced.
In addition, the calibration text in above-mentioned identification text can be regarded as: assuming that being marked in text when carrying out the comparison of two texts
X-th of text mismatches in x-th of text and identification text, needs to mark x-th of text in text and carries out text replacement behaviour
Make, the text is equivalent to the error label text in mark text at this time, and identifies in text and belong to same position with mark text
The text set can then regard calibration text as.
S207, when based on first pronunciation probabilistic information and second pronunciation probabilistic information determine text update decision condition set up
When, it will identify that text is determined as the new mark text of sample voice;Otherwise, continue that new mark of the text as sample voice will be marked
Explanatory notes sheet.
It is understood that above-mentioned text update decision condition is based primarily upon pronunciation unit information and is formed, therefore in determination
Sample voice is respectively under mark text and identification text after corresponding practical pronunciation unit information, so that it may be substituted into and selected
The text selected updates in the corresponding formula of decision condition, thus determines that it is whether true that text updates decision condition, if set up,
The identification text that can then will identify that is determined as the new mark text of sample voice, if invalid, can continue will be original
Mark new mark text of the text as sample voice.
S208, the sample voice based on set amount and current corresponding mark text, re -training optimize current sound
Learn model.
Illustratively, the mark instruction that the sample voice concentrated based on aforesaid operations to training data is labeled sample mentions
It, can be according to corresponding mark text re -training current acoustic model current after each sample voice, and promotion after rising.
A kind of method optimizing voice recognition acoustic model provided by Embodiment 2 of the present invention, shows in particular error label
Information determines operation, while the update for showing in particular mark text determines operation.Using this method, sample can be effectively improved
The mark quality of the corresponding mark text of this voice, to improve the instruction of training data needed for acoustic model, and then reaches
The purpose of optimization acoustic model, improves the accuracy rate of speech recognition very well.
Embodiment three
Fig. 3 is a kind of structural block diagram of the device for optimization voice recognition acoustic model that the embodiment of the present invention three provides, should
Device is suitable for the case where optimizing promotion to the acoustic model for speech recognition, which can be by hardware and/or soft
Part is realized, and is typically integrated in the computer equipment for having speech identifying function.As shown in figure 3, the device includes: that text obtains
Modulus block 31, error label determining module 32, mark text update module 33 and acoustic model optimization module 34.
Wherein, text obtains module 31, for obtaining the mark text of sample voice, and obtains the sample voice and is based on
The identification text that current acoustic model obtains;
Error label determining module 32 is for comparing the mark text and the identification text, and in comparison result
The error label information of the relatively described identification text of the mark text is determined when mismatch;
Text update module 33 is marked, for updating decision condition according to the corresponding text of the error label information, more
The mark text of the new sample voice;
Acoustic model optimization module 34, for based on set amount sample voice and current corresponding mark text,
Re -training optimizes the current acoustic model.
In the present embodiment, which obtains module 31 by text first and obtains the mark text of sample voice, and obtains
Then the identification text for taking the sample voice to obtain based on current acoustic model compares institute by error label determining module 32
Mark text and the identification text are stated, and determines the relatively described identification text of the mark text when comparison result is to mismatch
This error label information;It is updated later by mark text update module 33 according to the corresponding text of the error label information
Decision condition updates the mark text of the sample voice;Sample eventually by acoustic model optimization module 34 based on set amount
This voice and current corresponding mark text, re -training optimize the current acoustic model.
The device for the optimization voice recognition acoustic model that the embodiment of the present invention three provides, can effectively improve sample voice institute
The mark quality of corresponding mark text, to improve the instruction of training data needed for acoustic model, and then has reached optimization sound
The purpose for learning model, improves the accuracy rate of speech recognition very well.
Further, error label determining module 32, is specifically used for:
The mark text and identification text are compared, the editing distance of the mark text and the identification text is obtained,
And when the editing distance is non-zero, comparison result is determined to mismatch;When the comparison result is to mismatch, according to institute
Editing distance is stated, determines the error label sum of the relatively described identification text of the mark text, the position of error label
And the type of error of each error label;By the error label sum and position and the affiliated mistake of each error label
Accidentally type is denoted as the error label information.
Further, text update module 33 is marked, comprising:
Decision condition determination unit is looked into preset Multi-level information relation table for being based on the error label information
The corresponding text of the sample voice is looked for update decision condition, wherein it is that sample voice is being marked that the text, which updates decision condition,
This lower pronunciation probabilistic information of explanatory notes and sample voice pronounce in the case where identifying text probabilistic information judgement compared with;
Probabilistic information determination unit, for determining that the first pronunciation after the sample voice and the mark text justification is general
Rate information, and with it is described identification text justification after second pronunciation probabilistic information;
New text determination unit, for when true based on the first pronunciation probabilistic information and the second pronunciation probabilistic information
When the fixed text updates decision condition establishment, the identification text is determined as to the new mark text of the sample voice;It is no
Then, continue using the mark text as the new mark text of the sample voice.
On the basis of above-mentioned optimization, the decision condition determination unit is specifically used for:
Obtain the type of error of the error label sum in the error label information and error label;
It is index with the error label sum in the Multi-level information relation table, searches and the error label
The matched setting type of error of type of error;
The text update that the update decision condition for corresponding to the setting type of error is determined as the sample voice is determined
Plan condition.
Further, the Multi-level information relation table is constructed based on following step:
Initialization package is arranged containing primary information, second-level message arranges and the Multi-level information relation table of three-level information column;
The storage setting error label sum in primary information column, the setting error label sum include that 1 word is wrong
Accidentally, 2 character errors and multiword mistake;
Storage corresponds respectively to the setting type of error of 1 character error and 2 character errors in second-level message column, and
The information of second-level message cell corresponding to the multiword mistake is set as sky;
Storage corresponds to the update decision condition of each setting type of error in three-level information column, and will setting
Standard update decision condition be stored in the corresponding three-level information unit lattice of the multiword mistake.
Further, the first pronunciation probabilistic information divides the voice of formation based on the sample voice as unit of frame
Signal frame and the first pronunciation unit sequence formed to the mark text modeling determine;The second pronunciation probabilistic information base
It is determined in each speech signal frame and the second pronunciation unit sequence formed to the identification text modeling.
Example IV
Fig. 4 is a kind of hardware structural diagram for computer equipment that the embodiment of the present invention four provides.As shown in figure 4, this
The computer equipment that inventive embodiments four provide, comprising: processor 41 and storage device 42.Processor in the computer equipment
It can be one or more, the processor 41 and storage device in Fig. 4 by taking a processor 41 as an example, in the computer equipment
42 can be connected by bus or other modes, in Fig. 4 for being connected by bus.
Storage device 42 in the computer equipment is used as a kind of computer readable storage medium, can be used for storing one or
Multiple programs, described program can be software program, computer executable program and module, such as the embodiment of the present invention one or two
Corresponding program instruction/the module of method of optimization voice recognition acoustic model is provided (for example, attached optimization voice shown in Fig. 3
Identify the module in the device of acoustic model, comprising: text obtains module 31, error label determining module 32, mark text more
New module 33 and acoustic model optimization module 34).Processor 41 by operation be stored in storage device 42 software program,
Instruction and module, thereby executing the various function application and data processing of computer equipment, i.e. the realization above method is implemented
Optimize the method for voice recognition acoustic model in example.
Storage device 42 may include storing program area and storage data area, wherein storing program area can storage program area,
Application program needed at least one function;Storage data area, which can be stored, uses created data etc. according to equipment.In addition,
Storage device 42 may include high-speed random access memory, can also include nonvolatile memory, for example, at least a magnetic
Disk storage device, flush memory device or other non-volatile solid state memory parts.In some instances, storage device 42 can be into one
Step includes the memory remotely located relative to processor 41, these remote memories can pass through network connection to equipment.On
The example for stating network includes but is not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
Also, when one or more included program of above-mentioned computer equipment is by one or more of processors 41
When execution, program is proceeded as follows:
The mark text of sample voice is obtained, and obtains the identification text that the sample voice is obtained based on current acoustic model
This;The mark text and the identification text are compared, and determines that the mark text is opposite when comparison result is to mismatch
The error label information of the identification text;Decision condition is updated according to the corresponding text of the error label information, updates institute
State the mark text of sample voice;Sample voice and current corresponding mark text, re -training based on set amount are excellent
Change the current acoustic model.
In addition, the embodiment of the present invention also provides a kind of computer readable storage medium, it is stored thereon with computer program, it should
The side for the optimization voice recognition acoustic model that the embodiment of the present invention one or embodiment two provide is realized when program is executed by processor
Method this method comprises: obtaining the mark text of sample voice, and obtains what the sample voice was obtained based on current acoustic model
Identify text;The mark text and the identification text are compared, and determines the mark text when comparison result is to mismatch
The error label information of this relatively described identification text;Decision condition is updated according to the corresponding text of the error label information,
Update the mark text of the sample voice;Sample voice and current corresponding mark text based on set amount, again
Training optimizes the current acoustic model.
By the description above with respect to embodiment, it is apparent to those skilled in the art that, the present invention
It can be realized by software and required common hardware, naturally it is also possible to which by hardware realization, but in many cases, the former is more
Good embodiment.Based on this understanding, technical solution of the present invention substantially in other words contributes to the prior art
Part can be embodied in the form of software products, which can store in computer readable storage medium
In, floppy disk, read-only memory (Read-Only Memory, ROM), random access memory (Random such as computer
Access Memory, RAM), flash memory (FLASH), hard disk or CD etc., including some instructions are with so that a computer is set
Standby (can be personal computer, server or the network equipment etc.) executes method described in each embodiment of the present invention.
Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that
The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation,
It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention
It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also
It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.
Claims (8)
1. a kind of method for optimizing voice recognition acoustic model characterized by comprising
The mark text of sample voice is obtained, and obtains the identification text that the sample voice is obtained based on current acoustic model;
The mark text and the identification text are compared, and determines that the mark text is opposite when comparison result is to mismatch
The error label information of the identification text;
Decision condition is updated according to the corresponding text of the error label information, updates the mark text of the sample voice;
Sample voice and current corresponding mark text, re -training based on set amount optimize the current acoustic mould
Type;Wherein, the comparison mark text and the identification text, and the mark is determined when comparison result is to mismatch
The error label information of the relatively described identification text of text, comprising: compare the mark text and identification text, obtain the mark
The editing distance of explanatory notes sheet and the identification text, and when the editing distance is non-zero, comparison result is determined to mismatch;
When the comparison result is to mismatch, according to the editing distance, the relatively described identification text of the mark text is determined
The type of error of this error label sum, the position of error label and each error label;
The position and affiliated type of error of the error label sum and each error label are denoted as the error label
Information.
2. the method according to claim 1, wherein it is described according to the corresponding text of the error label information more
New decision condition updates the mark text of the sample voice, comprising:
Based on the error label information, the corresponding text of the sample voice is searched in preset Multi-level information relation table more
New decision condition, wherein it is that sample voice is pronounced probabilistic information and sample in the case where marking text that the text, which updates decision condition,
Pronounce in the case where the identifying text judgement of probabilistic information of voice is compared;
Determine the sample voice and it is described mark text justification after first pronunciation probabilistic information, and with the identification text
The second pronunciation probabilistic information after alignment;
When determining that the text updates decision condition based on the first pronunciation probabilistic information and the second pronunciation probabilistic information
When establishment, the identification text is determined as to the new mark text of the sample voice;Otherwise, continue to make in the mark text
For the new mark text of the sample voice.
3. according to the method described in claim 2, it is characterized in that, described be based on the error label information, preset more
The corresponding text of the sample voice is searched in grade information relationship table updates decision condition, comprising:
Obtain the type of error of the error label sum in the error label information and error label;
In the Multi-level information relation table, it is index with the error label sum, searches the mistake with the error label
The setting type of error of type matching;
The text that the update decision condition for corresponding to the setting type of error is determined as the sample voice is updated into decision item
Part.
4. according to the method in claim 2 or 3, which is characterized in that the Multi-level information relation table is based on following step structure
It builds:
Initialization package is arranged containing primary information, second-level message arranges and the Multi-level information relation table of three-level information column;
The storage setting error label sum in primary information column, the setting error label sum include 1 character error, 2
Character error and multiword mistake;
Storage corresponds respectively to the setting type of error of 1 character error and 2 character errors in second-level message column, and sets
The information of second-level message cell corresponding to the multiword mistake is sky;
Storage corresponds to the update decision condition of each setting type of error in three-level information column, and by the mark of setting
Standard updates decision condition and is stored in the corresponding three-level information unit lattice of the multiword mistake.
5. according to the method described in claim 2, it is characterized in that, the first pronunciation probabilistic information is based on the sample voice
The the first pronunciation unit sequence for dividing the speech signal frame of formation as unit of frame and being formed to the mark text modeling is true
It is fixed;
The second pronunciation probabilistic information is based on each speech signal frame and the second hair formed to the identification text modeling
Sound unit sequence determines.
6. a kind of device for optimizing voice recognition acoustic model characterized by comprising
Text obtains module, for obtaining the mark text of sample voice, and obtains the sample voice and is based on current acoustic mould
The identification text that type obtains;
Error label determining module is to mismatch for comparing the mark text and the identification text, and in comparison result
When determine it is described mark text relatively it is described identification text error label information;
Text update module is marked, for updating decision condition according to the corresponding text of the error label information, described in update
The mark text of sample voice;
Acoustic model optimization module, for based on set amount sample voice and current corresponding mark text, instruct again
Practice and optimizes the current acoustic model;
The error label determining module, is specifically used for:
It compares the mark text and identifies text, the editing distance of the acquisition mark text and the identification text, and
When the editing distance is non-zero, comparison result is determined to mismatch;
When the comparison result is to mismatch, according to the editing distance, the relatively described identification text of the mark text is determined
The type of error of this error label sum, the position of error label and each error label;
The position and affiliated type of error of the error label sum and each error label are denoted as the error label
Information.
7. a kind of computer equipment, which is characterized in that further include:
One or more processors;
Storage device, for storing one or more programs;
One or more of programs are executed by one or more of processors, so that one or more of processors are realized
The method of optimization voice recognition acoustic model a method as claimed in any one of claims 1 to 5.
8. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is held by processor
The method of optimization voice recognition acoustic model a method as claimed in any one of claims 1 to 5 is realized when row.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810146221.8A CN108389577B (en) | 2018-02-12 | 2018-02-12 | Optimize method, system, equipment and the storage medium of voice recognition acoustic model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810146221.8A CN108389577B (en) | 2018-02-12 | 2018-02-12 | Optimize method, system, equipment and the storage medium of voice recognition acoustic model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108389577A CN108389577A (en) | 2018-08-10 |
CN108389577B true CN108389577B (en) | 2019-05-31 |
Family
ID=63068887
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810146221.8A Active CN108389577B (en) | 2018-02-12 | 2018-02-12 | Optimize method, system, equipment and the storage medium of voice recognition acoustic model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108389577B (en) |
Families Citing this family (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111161718A (en) * | 2018-11-07 | 2020-05-15 | 珠海格力电器股份有限公司 | Voice recognition method, device, equipment, storage medium and air conditioner |
CN109817205B (en) * | 2018-12-10 | 2024-03-22 | 平安科技(深圳)有限公司 | Text confirmation method and device based on semantic analysis and terminal equipment |
CN109545186B (en) * | 2018-12-16 | 2022-05-27 | 魔门塔(苏州)科技有限公司 | Speech recognition training system and method |
CN110070854A (en) * | 2019-04-17 | 2019-07-30 | 北京爱数智慧科技有限公司 | Voice annotation quality determination method, device, equipment and computer-readable medium |
CN110210294A (en) * | 2019-04-23 | 2019-09-06 | 平安科技(深圳)有限公司 | Evaluation method, device, storage medium and the computer equipment of Optimized model |
CN110265001B (en) * | 2019-05-06 | 2023-06-23 | 平安科技(深圳)有限公司 | Corpus screening method and device for speech recognition training and computer equipment |
CN110263322B (en) * | 2019-05-06 | 2023-09-05 | 平安科技(深圳)有限公司 | Audio corpus screening method and device for speech recognition and computer equipment |
CN110310643B (en) * | 2019-05-18 | 2021-04-30 | 江苏网进科技股份有限公司 | License plate voice recognition system and method thereof |
CN110472054B (en) * | 2019-08-15 | 2023-05-23 | 北京爱数智慧科技有限公司 | Data processing method and device |
CN110610698B (en) * | 2019-09-12 | 2022-09-27 | 上海依图信息技术有限公司 | Voice labeling method and device |
CN110853628A (en) * | 2019-11-18 | 2020-02-28 | 苏州思必驰信息科技有限公司 | Model training method and device, electronic equipment and storage medium |
CN111402865B (en) * | 2020-03-20 | 2023-08-08 | 北京达佳互联信息技术有限公司 | Method for generating voice recognition training data and method for training voice recognition model |
CN111681642B (en) * | 2020-06-03 | 2022-04-15 | 北京字节跳动网络技术有限公司 | Speech recognition evaluation method, device, storage medium and equipment |
CN112037769A (en) * | 2020-07-28 | 2020-12-04 | 出门问问信息科技有限公司 | Training data generation method and device and computer readable storage medium |
CN111881297A (en) * | 2020-07-31 | 2020-11-03 | 龙马智芯(珠海横琴)科技有限公司 | Method and device for correcting voice recognition text |
CN113539241B (en) * | 2021-07-28 | 2023-04-25 | 广州华多网络科技有限公司 | Speech recognition correction method and corresponding device, equipment and medium thereof |
CN113793604B (en) * | 2021-09-14 | 2024-01-05 | 思必驰科技股份有限公司 | Speech recognition system optimization method and device |
CN114974221B (en) * | 2022-04-29 | 2024-01-19 | 中移互联网有限公司 | Speech recognition model training method and device and computer readable storage medium |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102651217A (en) * | 2011-02-25 | 2012-08-29 | 株式会社东芝 | Method and equipment for voice synthesis and method for training acoustic model used in voice synthesis |
CN102682763B (en) * | 2011-03-10 | 2014-07-16 | 北京三星通信技术研究有限公司 | Method, device and terminal for correcting named entity vocabularies in voice input text |
CN103165129B (en) * | 2011-12-13 | 2015-07-01 | 北京百度网讯科技有限公司 | Method and system for optimizing voice recognition acoustic model |
CN103632667B (en) * | 2013-11-25 | 2017-08-04 | 华为技术有限公司 | acoustic model optimization method, device and voice awakening method, device and terminal |
WO2015102127A1 (en) * | 2013-12-31 | 2015-07-09 | 엘지전자 주식회사 | System and method for voice recognition |
US9728185B2 (en) * | 2014-05-22 | 2017-08-08 | Google Inc. | Recognizing speech using neural networks |
KR20170046291A (en) * | 2015-10-21 | 2017-05-02 | 삼성전자주식회사 | Electronic device, acoustic model adaptation method of thereof and speech recognition system |
-
2018
- 2018-02-12 CN CN201810146221.8A patent/CN108389577B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN108389577A (en) | 2018-08-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108389577B (en) | Optimize method, system, equipment and the storage medium of voice recognition acoustic model | |
CN107305768B (en) | Error-prone character calibration method in voice interaction | |
Siivola et al. | Unlimited vocabulary speech recognition based on morphs discovered in an unsupervised manner | |
CN103903619B (en) | A kind of method and system improving speech recognition accuracy | |
CN106297800B (en) | Self-adaptive voice recognition method and equipment | |
CN109145276A (en) | A kind of text correction method after speech-to-text based on phonetic | |
CN108804428A (en) | Correcting method, system and the relevant apparatus of term mistranslation in a kind of translation | |
CN105404621B (en) | A kind of method and system that Chinese character is read for blind person | |
CN106503231B (en) | Search method and device based on artificial intelligence | |
CN107451115B (en) | Method and system for constructing end-to-end Chinese prosody hierarchical structure prediction model | |
CN106598939A (en) | Method and device for text error correction, server and storage medium | |
CN1781102B (en) | Low memory decision tree | |
CN101441527B (en) | Method and apparatus for prompting right pronunciation in phonetic input | |
CN109800298A (en) | A kind of training method of Chinese word segmentation model neural network based | |
CN105261358A (en) | N-gram grammar model constructing method for voice identification and voice identification system | |
CN104021784A (en) | Voice synthesis method and device based on large corpus | |
CN104485107A (en) | Name voice recognition method, name voice recognition system and name voice recognition equipment | |
CN101515456A (en) | Speech recognition interface unit and speed recognition method thereof | |
CN112966496A (en) | Chinese error correction method and system based on pinyin characteristic representation | |
CN102478968B (en) | Chinese phonetic input method and Chinese pinyin input system | |
CN115293139A (en) | Training method of voice transcription text error correction model and computer equipment | |
CN109460558B (en) | Effect judging method of voice translation system | |
CN102184172A (en) | Chinese character reading system and method for blind people | |
CN102955770A (en) | Method and system for automatic recognition of pinyin | |
US7831549B2 (en) | Optimization of text-based training set selection for language processing modules |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |