CN104517605B - A kind of sound bite splicing system and method for phonetic synthesis - Google Patents
A kind of sound bite splicing system and method for phonetic synthesis Download PDFInfo
- Publication number
- CN104517605B CN104517605B CN201410734257.XA CN201410734257A CN104517605B CN 104517605 B CN104517605 B CN 104517605B CN 201410734257 A CN201410734257 A CN 201410734257A CN 104517605 B CN104517605 B CN 104517605B
- Authority
- CN
- China
- Prior art keywords
- sound bite
- sound
- point
- sampling point
- slope
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a kind of sound bite splicing system and method for phonetic synthesis, first, two sound bites to be spliced are extracted from sound bank as the first sound bite and the second sound bite, and optimum sampling point is selected from the first sound bite and the second sound bite;Then, it is smooth that single order is carried out to optimum sampling point, generates voice joint point;Single order smoothing method is:Calculate the slope k at optimum sampling point U1, U2a、kb, and optimum sampling point U1, U2 numerical value difference value deltaU;According to slope ka、kbWith difference value deltaUIt is predicted, generates voice joint point.Finally, voice joint point is inserted between the first sound bite and the second sound bite, generates the 3rd sound bite.The present invention solves the problems, such as the voice spectrum saltus step that direct splicing occurs in the prior art, and the problem of smoothing method amount of calculation that adds up again is excessive is searched by auto-correlation, the frequency spectrum for making stitching portion by the smooth method of single order obtains good continuity, enhances user's auditory perception.
Description
Technical field
It is more particularly to a kind of for the sound bite splicing system of phonetic synthesis and side the present invention relates to phonetic synthesis field
Method.
Background technology
Existing voice synthetic method has based on speech characteristic parameter and based on two methods of waveform concatenation.Relative to based on ginseng
Several methods, the phonetic synthesis based on waveform concatenation can obtain the higher synthesis voice of quality, and sound sounds also more natural,
More close to the tone color of original transcription people.Therefore, the online phonetic synthesis of main flow is all to bias toward to spell using based on waveform at present
The phonetic synthesis scheme connect.
Phoneme synthesizing method principle based on waveform concatenation is:First selected from the sound bank for prerecording and completing mark
Then suitable voice unit obtains final synthesis language as sound bite to be spliced by the splicing between sound bite
Sound.Using this joining method, if the fragment of splicing is bad in junction processing, saltus step occurs on frequency spectrum, will lead
Family of applying is unnatural on auditory perception.Therefore a crucial technical problem is:Which type of caused using joining method
The sound bite for completing splicing is capable of the output of smoothness.
Current existing joining method is using the smooth method that added up again after first being alignd to sound bite, this splicing
The sound bite smooth effect of method output is general, the problem of saltus step between sound bite frequency spectrum be present.In addition, in certain situation
Under, the problem of can not find smooth alignment point be present in this joining method.From user's sense of hearing, it may appear that the high frequency explosion of ' ' sound
Sound, the auditory perception of user can be influenceed.Therefore, it is necessary to a kind of sound bite splicing side for the sound bite that can export smoothness
Method.
The content of the invention
The technical problems to be solved by the invention are to provide a kind of sound bite for the sound bite that can export smoothness and spelled
Connect method.
The technical scheme that the present invention solves above-mentioned technical problem is as follows:A kind of sound bite for phonetic synthesis splices system
System, including sound bank, Samples selecting module, voice joint point generation module and concatenation module;
The sound bank, it is used to store the sound bite for recording and completing mark;
The Samples selecting module, its be used to extracting from sound bank two sound bites to be spliced respectively as
First sound bite and the second sound bite, and select optimum sampling from first sound bite and the second sound bite
Point;
The voice joint point generation module, it is used for, generation voice joint point smooth to optimum sampling point progress single order;
The concatenation module, it is used to insert voice joint point between the first sound bite and the second sound bite, raw
Into the 3rd sound bite.
The beneficial effects of the invention are as follows:Solve and mobile cumulative smoothing method appearance is searched again by the cycle in the prior art
Voice spectrum saltus step the problem of, frequency spectrum of the voice in stitching portion is obtained good continuity by the smooth method of single order,
Also enhance user's auditory perception.In addition, single order smooth registration method is when searching stitching position candidate's sampled point, it is not necessary to counts
The auto-correlation of voice signal is calculated, accurately stitching position is found so as to simpler, greatly reduces amount of calculation, improve fortune
Scanning frequency degree.
On the basis of above-mentioned technical proposal, the present invention also makes following improvement.
Further, the Samples selecting module includes search unit and screening unit;
The search unit, it is used to scan for obtaining at least two to first sound bite and the second sound bite
Individual candidate's sampled point;
The screening unit, it is used for from least two candidate's sampled points the optimum sampling for filtering out the first sound bite
Point U1 and the second sound bite optimum sampling point U2.
Further, the voice joint point generation module includes computing unit and predicting unit;
The computing unit, it is used to calculate the slope k at the optimum sampling point U1aAt the optimum sampling point U2
Slope kb, and the difference value delta of optimum sampling point U1 numerical value and optimum sampling point U2 numerical valueU;
The predicting unit, it is used for according to slope ka, slope kbWith difference value deltaUIt is predicted, generation voice is spelled
Contact.
Further, the searcher of use is scanned in the search unit to the first sound bite and the second sound bite
Formula is bidirectional research, and the first sound bite is using way of search from back to front, and the second sound bite is using searching from front to back
Rope mode.
Further, carrying out the condition that candidate's sampled point that the bidirectional research is drawn meets is:
Condition one, the first sound bite and the second sound bite are less than setting in the difference of the absolute value of candidate's sampled point slope
Threshold value Tk, i.e. abs (ka-kb)<Tk;
Condition two, the first sound bite and the second sound bite are less than adjustable in the absolute value of the difference of candidate's sampling point value
Parameter ratio and the first sound bite are in the product of the absolute value of candidate's sampled point slope, i.e. abs (Sa-Sb)<ratio*abs
(ka)。
Further, screening optimum sampling point uses minimal error cost criterion, and minimal error cost is slope difference cost
With the weighting sum of the different cost of numerical difference, i.e. U*=argmin (w1*Dratio+w2*Dval), wherein, w1For optimum sampling point U*Place
Slope cost weighting weight, w2For optimum sampling point U*The weighting weight of numerical value difference cost, DratioFor optimum sampling point U*
The slope difference function at place, DvalFor optimum sampling point U*Numerical value difference function.
In order to solve the above-mentioned technical problem, the present invention also provides a kind of sound bite joining method for phonetic synthesis,
Comprise the following steps,
Step 1:Two sound bites to be spliced are extracted from sound bank respectively as the first sound bite and second
Sound bite, and select optimum sampling point from first sound bite and the second sound bite;
Step 2:It is smooth that single order is carried out to optimum sampling point, generates voice joint point;
Step 3:Voice joint point is inserted between the first sound bite and the second sound bite, generates the 3rd voice sheet
Section.
Further, the step 1 specifically,
101:Two sound bites to be spliced are extracted from sound bank respectively as the first sound bite and the second language
Tablet section;
102:First sound bite and the second sound bite are scanned for obtain at least two candidate's sampled points;
103:The optimum sampling point U1 and the second voice of the first sound bite are filtered out from least two candidate's sampled points
The optimum sampling point U2 of fragment.
Further, the step 2 specifically,
201:Calculate the slope k at the optimum sampling point U1aWith the slope k of the optimum sampling point U2b, and most preferably
The difference value delta of sampled point U1 numerical value and optimum sampling point U2 numerical valueU;
202:According to slope ka, slope kbWith difference value deltaUIt is predicted, generates voice joint point.
Further, the searcher of use is scanned for described in step 102 to the first sound bite and the second sound bite
Formula is bidirectional research, and the first sound bite is using way of search from back to front, and the second sound bite is using searching from front to back
Rope mode, carrying out the condition that candidate's sampled point that the bidirectional research is drawn meets is:
Condition one, the first sound bite and the second sound bite are less than setting in the difference of the absolute value of candidate's sampled point slope
Threshold value, i.e. abs (ka-kb)<Tk;
Condition two, the first sound bite and the second sound bite are less than adjustable in the absolute value of the difference of candidate's sampling point value
Parameter ratio and the first sound bite are in the product of the absolute value of candidate's sampled point slope, i.e. abs (Sa-Sb)<ratio*abs
(ka)。
Brief description of the drawings
Fig. 1 is a kind of sound bite splicing system modular structure schematic diagram for phonetic synthesis of the invention;
Fig. 2 is that a kind of sound bite splicing system for phonetic synthesis of the invention carries out bidirectional research side to sound bite
To schematic diagram;
Fig. 3 is a kind of sound bite joining method flow chart of steps for phonetic synthesis of the invention.
In accompanying drawing, the list of parts representated by each label is as follows:
1st, sound bank, 2, Samples selecting module, 3, voice joint point generation module,
4th, concatenation module, 21, search unit, 22, screening unit,
31st, computing unit, 32, predicting unit.
Embodiment
The principle and feature of the present invention are described below in conjunction with accompanying drawing, the given examples are served only to explain the present invention, and
It is non-to be used to limit the scope of the present invention.
Fig. 1 is a kind of sound bite splicing system modular structure schematic diagram for phonetic synthesis of the invention, such as Fig. 1 institutes
Show, a kind of sound bite splicing system for phonetic synthesis, including sound bank 1, Samples selecting module 2, voice joint point
Generation module 3 and concatenation module 4;Sound bank 1 stores the sound bite for recording and completing mark;Sound bite in sound bank 1
Quantity is at least two.Samples selecting module, two sound bites for extracting to be spliced from sound bank 1 are made respectively
For the first sound bite and the second sound bite, and select from first sound bite and the second sound bite and most preferably adopt
Sampling point.Voice joint point generation module, it is smooth for carrying out single order to optimum sampling point, generate voice joint point;Concatenation module,
For voice joint point to be inserted between the first sound bite and the second sound bite, the 3rd sound bite is generated.
Samples selecting module 2 includes:Search unit 21 and screening unit 22, voice joint point generation module 3 include meter
Calculate unit 31 and predicting unit 32.
Search unit 21 is used to the first sound bite and the second sound bite are scanned for obtaining at least two candidates to adopt
Sampling point;For two sound bites to be spliced, the last period sound bite is referred to as the first sound bite, latter section of voice sheet
Section is referred to as the second sound bite.
As shown in Fig. 2 the way of search for scanning for using to the first sound bite and the second sound bite is searched to be two-way
Rope, the first sound bite is using way of search from back to front, and the second sound bite is using way of search from front to back.Carry out
Candidate's sampled point that bidirectional research is drawn needs to meet two conditions:
abs(ka-kb)<TkCondition one
abs(Sa-Sb)<ratio*abs(ka) condition two
Condition one, the first sound bite and the second sound bite are less than setting in the difference of the absolute value of candidate's sampled point slope
Threshold value Tk.Wherein, kaIt is the first sound bite in the slope of candidate's sampling point position, kbAdopted for the second sound bite in candidate
The slope of sampling point position.
Condition two, the first sound bite and the second sound bite are less than adjustable in the absolute value of the difference of candidate's sampling point value
The product of parameter ratio and the first sound bite in the absolute value of candidate's sampled point slope.Wherein, SaExist for the first sound bite
The numerical value of the sampled point of candidate's sampling point position, SbIt is the second sound bite in the numerical value of candidate's sampled point, kaFor the first voice sheet
Section is in the slope of candidate's sampling point position, adjustable parameter ratio control difference value changes sizes.
Meet above-mentioned two condition voice point, candidate's sampled point as splicing simultaneously.Fixing the first sound bite
While candidate's sampled point, search is moved after the second sound bite.One wheel search finishes, before candidate's sampled point of the first sound bite
Move, continue next round search.Search end condition is searches out alternative splicing candidate's sampled point and the first sound bite and the
Two sound bites are moved to higher limit.When search terminates, multiple (at least two) candidate's sampled points can be obtained, and these are waited
It is even number to select sampled point number, i.e., the candidate's sampled point collected respectively from the first sound bite and the second sound bite.
After obtaining candidate's sampled point, screening unit 22 filters out the first sound bite from least two candidate's sampled points
Optimum sampling point U1 and the second sound bite optimum sampling point U2.
Screen optimum sampling point U*(i.e. U1, U2, U3, U4 ...), sampled using the criterion of minimal error cost from candidate
Optimum sampling point U is selected in point*Position as follow-up smooth interpolation.Minimal error cost is optimum sampling point U*Locate slope differences
The weighting sum of the different different cost of cost and numerical difference.
U*=argmin (w1*Dratio+w2*Dval)
Wherein, w1For optimum sampling point U*The weighting weight of the slope cost at place, w2For optimum sampling point U*Numerical value difference generation
The weighting weight of valency.DratioFor optimum sampling point U*The cost function of the slope difference at place, DvalFor optimum sampling point U*Numerical difference
Different cost function.Optimum sampling point U1, U2 are finally drawn according to minimal error cost criterion.
Computing unit 31 calculates the slope k at optimum sampling point U1aWith the slope k at the optimum sampling point U2b, and
The difference value delta of optimum sampling point U1 numerical value and optimum sampling point U2 numerical valueU;
Predicting unit 32 is according to slope ka, slope kbWith difference value deltaUIt is predicted, generates voice joint point.Prediction
Process is:
Slope prediction, if the optimal splice point U1 of the first sound bite is the sampled point at T moment, amplitude size is S, then
The sampled point T-1 at T-1 moment amplitude size is ST-1=S-ka, wherein kaFor optimal splice point U1 slope, then can be predicted
The sampled point amplitude for going out for the first sound bite T+1 moment isIf the optimal splicing of the second sound bite
Point U2 is the sampled point of n-hour, and amplitude size is V, then the sampled point N+1 at N+1 moment amplitude size is VN+1=V+
Kb, wherein kbFor the slope of optimal splice point, then the sampled point amplitude that can be predicted for the second sound bite N-1 moment is
From slope prediction, the first sound bite and the second sound bite exist in respective optimal splice point junction
Sampled point forecasted variances
This species diversity cause both can not direct splicing together, therefore, it is necessary to be modified to sampling point value, obtain
Going out revised sampling point value is
Final splicing sequence is
…… S-ka S E V V+Kb ………
Because the optimum sampling point described above spliced to the first sound bite and the second sound bite carries out smooth manner
Slope information (single order information) is make use of, therefore this smooth manner is single order exponential smoothing.
Fig. 3 is a kind of sound bite joining method flow chart of steps for phonetic synthesis of the invention, as indicated at 3, a kind of
For the sound bite joining method of phonetic synthesis, comprise the following steps,
Step 1:Two sound bites to be spliced are extracted from sound bank respectively as the first sound bite and second
Sound bite, and select optimum sampling point from first sound bite and the second sound bite;
Step 2:It is smooth that single order is carried out to optimum sampling point, generates voice joint point;
Step 3:Voice joint point is inserted between the first sound bite and the second sound bite, generates the 3rd voice sheet
Section.
Step 1 specifically,
101:Two sound bites to be spliced are extracted from sound bank respectively as the first sound bite and the second language
Tablet section;
102:First sound bite and the second sound bite are scanned for obtain at least two candidate's sampled points;
103:The optimum sampling point U1 and the second voice of the first sound bite are filtered out from least two candidate's sampled points
The optimum sampling point U2 of fragment.
In step 102, the way of search for scanning for using to the first sound bite and the second sound bite is searched to be two-way
Rope, the first sound bite are carried out using way of search from back to front, the second sound bite using way of search from front to back
The condition that candidate's sampled point that the bidirectional research is drawn meets is:
Condition one, the first sound bite and the second sound bite are less than setting in the difference of the absolute value of candidate's sampled point slope
Threshold value, i.e. abs (ka-kb)<Tk;
Condition two, the first sound bite and the second sound bite are less than adjustable in the absolute value of the difference of candidate's sampling point value
Parameter rat io and the first sound bite are in the product of the absolute value of candidate's sampled point slope, i.e. abs (Sa-Sb)<ratio*abs
(ka)。
Meet above-mentioned two condition voice point, candidate's sampled point as splicing simultaneously.Fixing the first sound bite
While candidate's sampled point, search is moved after the second sound bite.One wheel search finishes, before candidate's sampled point of the first sound bite
Move, continue next round search.Search end condition is searches out alternative splicing candidate's sampled point and the first sound bite and the
Two sound bites are moved to higher limit.When search terminates, multiple (at least two) candidate's sampled points can be obtained, and these are waited
It is even number to select sampled point number, i.e., the candidate's sampled point collected respectively from the first sound bite and the second sound bite.
In step 103, optimum sampling point U is screened*(i.e. U1, U2, U3, U4 ...), using the criterion of minimal error cost
Optimum sampling point U is selected from candidate's sampled point*Position as follow-up smooth interpolation.Minimal error cost is optimum sampling point
U*Locate the weighting sum of slope difference cost and the different cost of numerical difference.
U*=argmin (w1*Dratio+w2*Dval)
Wherein, w1For optimum sampling point U*The weighting weight of the slope cost at place, w2For optimum sampling point U*Numerical value difference generation
The weighting weight of valency.DratioFor optimum sampling point U*The cost function of the slope difference at place, DvalFor optimum sampling point U*Numerical difference
Different cost function.Optimum sampling point U1, U2 are finally drawn according to minimal error cost criterion.
Step 2 specifically,
201:Calculate the slope k at the optimum sampling point U1aWith the slope k of the optimum sampling point U2b, and most preferably
The difference value delta of sampled point U1 numerical value and optimum sampling point U2 numerical valueU;
202:According to slope ka, slope kbWith difference value deltaUIt is predicted, generates voice joint point.
In step 202, prediction process is:
Slope prediction, if the optimal splice point U1 of the first sound bite is the sampled point at T moment, amplitude size is S, then
The sampled point T-1 at T-1 moment amplitude size is ST-1=S-ka, wherein kaFor optimal splice point U1 slope, then can be predicted
The sampled point amplitude for going out for the first sound bite T+1 moment is ST+1=S+ka.If the optimal splice point U2 of the second sound bite is
The sampled point of n-hour, amplitude size are V, then the sampled point N+1 at N+1 moment amplitude size is VN+1=V+Kb, wherein kb
For the slope of optimal splice point, then the sampled point amplitude that can be predicted for the second sound bite N-1 moment is
From slope prediction, the first sound bite and the second sound bite exist in respective optimal splice point junction
Sampled point forecasted variances
This species diversity cause both can not direct splicing together, therefore, it is necessary to be modified to sampling point value, obtain
Going out revised sampling point value is
Final splicing sequence is
…… S-ka S E V V+Kb ………
The present invention solves to search by the cycle in the prior art moves the voice spectrum jump that cumulative smoothing method occurs again
The problem of change, frequency spectrum of the voice in stitching portion is obtained good continuity by the smooth method of single order, also enhance user
Auditory perception.In addition, single order smooth registration method is when searching stitching position candidate's sampled point, it is not necessary to calculates voice signal
Auto-correlation, stitching position accurately is found so as to simpler, amount of calculation is greatly reduced, improves the speed of service.
The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all the present invention spirit and
Within principle, any modification, equivalent substitution and improvements made etc., it should be included in the scope of the protection.
Claims (6)
1. a kind of sound bite splicing system for phonetic synthesis, it is characterised in that including sound bank, Samples selecting mould
Block, voice joint point generation module and concatenation module;
The sound bank, it is used to store the sound bite for recording and completing mark;
The Samples selecting module includes search unit and screening unit, wherein, the search unit, it is used for described
One sound bite and the second sound bite scan for obtaining at least two candidate's sampled points;
The screening unit, it is used for from least two candidate's sampled points the optimum sampling point U1 for filtering out the first sound bite
With the optimum sampling point U2 of the second sound bite;
The voice joint point generation module includes computing unit and predicting unit, wherein, the computing unit, it is used to calculate
Slope k at the optimum sampling point U1aWith the slope k at the optimum sampling point U2b, and optimum sampling point U1 numerical value
With the difference value delta of optimum sampling point U2 numerical valueU;
The predicting unit, it is used for according to slope ka, slope kbWith difference value deltaUIt is predicted, generates voice joint point;
The concatenation module, it is used to insert voice joint point between the first sound bite and the second sound bite, generation the
Three sound bites.
A kind of 2. sound bite splicing system for phonetic synthesis according to claim 1, it is characterised in that the search
The way of search for scanning for using to the first sound bite and the second sound bite in unit is bidirectional research, the first voice sheet
The ways of search of Duan Caiyong from back to front, the second sound bite is using way of search from front to back.
3. a kind of sound bite splicing system for phonetic synthesis according to claim 2, it is characterised in that described in implementation
The condition that candidate's sampled point that bidirectional research is drawn meets is:
Condition one, the first sound bite and the second sound bite are less than the threshold of setting in the difference of the absolute value of candidate's sampled point slope
Value Tk, i.e. abs (ka-kb)<Tk;
Condition two, the first sound bite and the second sound bite are less than adjustable parameter in the absolute value of the difference of candidate's sampling point value
Ratio and the first sound bite are in the product of the absolute value of candidate's sampled point slope, i.e. abs (Sa-Sb)<ratio*abs(ka)。
4. a kind of sound bite splicing system for phonetic synthesis according to claim 1, it is characterised in that screening is optimal
Sampled point uses minimal error cost criterion, and minimal error cost is sampled point U*The different cost of slope cost and numerical difference at place
Weight sum, U*=argmin (w1*Dratio+w2*Dval), wherein, w1For optimum sampling point U*The slope difference cost at place adds
Weigh weight, w2For optimum sampling point U*The weighting weight of numerical value difference cost, DratioFor optimum sampling point U*The slope difference letter at place
Number, DvalFor optimum sampling point U*Numerical value difference function.
A kind of 5. sound bite joining method for phonetic synthesis, it is characterised in that comprise the following steps,
Step 1:Two sound bites to be spliced are extracted from sound bank respectively as the first sound bite and the second voice
Fragment, first sound bite and the second sound bite are scanned for obtain at least two candidate's sampled points, from least two
The optimum sampling point U1 of the first sound bite and the optimum sampling point U2 of the second sound bite are filtered out in individual candidate's sampled point;
Step 2:Calculate the slope k at the optimum sampling point U1aWith the slope k of the optimum sampling point U2b, and most preferably adopt
The difference value delta of sampling point U1 numerical value and optimum sampling point U2 numerical valueU, according to slope ka, slope kbWith difference value deltaU
It is predicted, generates voice joint point;
Step 3:Voice joint point is inserted between the first sound bite and the second sound bite, generates the 3rd sound bite.
6. a kind of sound bite joining method for phonetic synthesis according to claim 5, it is characterised in that in step 1
The way of search for scanning for using to the first sound bite and the second sound bite is bidirectional research, the first sound bite
Using way of search from back to front, the second sound bite is carried out the bidirectional research and obtained using way of search from front to back
The condition that candidate's sampled point for going out meets is:
Condition one, the first sound bite and the second sound bite are less than the threshold of setting in the difference of the absolute value of candidate's sampled point slope
Value, i.e. abs (ka-kb)<Tk;
Condition two, the first sound bite and the second sound bite are less than adjustable parameter in the absolute value of the difference of candidate's sampling point value
Ratio and the first sound bite are in the product of the absolute value of candidate's sampled point slope, i.e. abs (Sa-Sb)<ratio*abs(ka)。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410734257.XA CN104517605B (en) | 2014-12-04 | 2014-12-04 | A kind of sound bite splicing system and method for phonetic synthesis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410734257.XA CN104517605B (en) | 2014-12-04 | 2014-12-04 | A kind of sound bite splicing system and method for phonetic synthesis |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104517605A CN104517605A (en) | 2015-04-15 |
CN104517605B true CN104517605B (en) | 2017-11-28 |
Family
ID=52792811
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410734257.XA Active CN104517605B (en) | 2014-12-04 | 2014-12-04 | A kind of sound bite splicing system and method for phonetic synthesis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104517605B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105679306B (en) * | 2016-02-19 | 2019-07-09 | 云知声(上海)智能科技有限公司 | The method and system of fundamental frequency frame are predicted in speech synthesis |
CN108831424B (en) * | 2018-06-15 | 2021-01-08 | 广州酷狗计算机科技有限公司 | Audio splicing method and device and storage medium |
CN109389969B (en) * | 2018-10-29 | 2020-05-26 | 百度在线网络技术(北京)有限公司 | Corpus optimization method and apparatus |
CN109979440B (en) * | 2019-03-13 | 2021-05-11 | 广州市网星信息技术有限公司 | Keyword sample determination method, voice recognition method, device, equipment and medium |
CN112562635B (en) * | 2020-12-03 | 2024-04-09 | 云知声智能科技股份有限公司 | Method, device and system for solving generation of pulse signals at splicing position in speech synthesis |
CN112863530A (en) * | 2021-01-07 | 2021-05-28 | 广州欢城文化传媒有限公司 | Method and device for generating sound works |
CN112971778A (en) * | 2021-02-09 | 2021-06-18 | 北京师范大学 | Brain function imaging signal obtaining method and device and electronic equipment |
CN113421547B (en) * | 2021-06-03 | 2023-03-17 | 华为技术有限公司 | Voice processing method and related equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1333501A (en) * | 2001-07-20 | 2002-01-30 | 北京捷通华声语音技术有限公司 | Dynamic Chinese speech synthesizing method |
CN1540624A (en) * | 2003-04-25 | 2004-10-27 | 阿尔卡特公司 | Method of generating speech according to text |
CN1731510A (en) * | 2004-08-05 | 2006-02-08 | 摩托罗拉公司 | Text-speech conversion for amalgamated language |
JP2008191334A (en) * | 2007-02-02 | 2008-08-21 | Oki Electric Ind Co Ltd | Speech synthesis method, speech synthesis program, speech synthesis device and speech synthesis system |
JP2008299266A (en) * | 2007-06-04 | 2008-12-11 | Mitsubishi Electric Corp | Speech synthesis device and method |
-
2014
- 2014-12-04 CN CN201410734257.XA patent/CN104517605B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1333501A (en) * | 2001-07-20 | 2002-01-30 | 北京捷通华声语音技术有限公司 | Dynamic Chinese speech synthesizing method |
CN1540624A (en) * | 2003-04-25 | 2004-10-27 | 阿尔卡特公司 | Method of generating speech according to text |
CN1731510A (en) * | 2004-08-05 | 2006-02-08 | 摩托罗拉公司 | Text-speech conversion for amalgamated language |
JP2008191334A (en) * | 2007-02-02 | 2008-08-21 | Oki Electric Ind Co Ltd | Speech synthesis method, speech synthesis program, speech synthesis device and speech synthesis system |
JP2008299266A (en) * | 2007-06-04 | 2008-12-11 | Mitsubishi Electric Corp | Speech synthesis device and method |
Non-Patent Citations (1)
Title |
---|
《基于韵律匹配代价和韵律拼接代价的汉语语音合成》;张鹏 等;《哈尔滨工业大学学报》;20061130;第38卷(第11期);第1节第1段,第4节第2段 * |
Also Published As
Publication number | Publication date |
---|---|
CN104517605A (en) | 2015-04-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104517605B (en) | A kind of sound bite splicing system and method for phonetic synthesis | |
CN102779508B (en) | Sound bank generates Apparatus for () and method therefor, speech synthesis system and method thereof | |
CN104780388B (en) | The cutting method and device of a kind of video data | |
US20180349495A1 (en) | Audio data processing method and apparatus, and computer storage medium | |
US8890869B2 (en) | Colorization of audio segments | |
CN101178896B (en) | Unit selection voice synthetic method based on acoustics statistical model | |
CN110213670A (en) | Method for processing video frequency, device, electronic equipment and storage medium | |
CN109147758A (en) | A kind of speaker's sound converting method and device | |
CN106157951B (en) | Carry out the automatic method for splitting and system of audio punctuate | |
JP4220449B2 (en) | Indexing device, indexing method, and indexing program | |
CN106021496A (en) | Video search method and video search device | |
CN103700370A (en) | Broadcast television voice recognition method and system | |
CN102723078A (en) | Emotion speech recognition method based on natural language comprehension | |
CN101930747A (en) | Method and device for converting voice into mouth shape image | |
CN110096966A (en) | A kind of audio recognition method merging the multi-modal corpus of depth information Chinese | |
CN109979428B (en) | Audio generation method and device, storage medium and electronic equipment | |
CN106302987A (en) | A kind of audio frequency recommends method and apparatus | |
CN106297765B (en) | Phoneme synthesizing method and system | |
CN108172211B (en) | Adjustable waveform splicing system and method | |
CN103915093A (en) | Method and device for realizing voice singing | |
CN101867742A (en) | Television system based on sound control | |
CN110277087A (en) | A kind of broadcast singal anticipation preprocess method | |
US9666211B2 (en) | Information processing apparatus, information processing method, display control apparatus, and display control method | |
CN107507627B (en) | Voice data heat analysis method and system | |
Felipe et al. | Acoustic scene classification using spectrograms |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP01 | Change in the name or title of a patent holder |
Address after: 100191, Beijing, Huayuan Road, Haidian District No. 2 peony technology building, block A, 5 Patentee after: Yunzhisheng Intelligent Technology Co., Ltd. Address before: 100191, Beijing, Huayuan Road, Haidian District No. 2 peony technology building, block A, 5 Patentee before: Beijing Yunzhisheng Information Technology Co., Ltd. |
|
CP01 | Change in the name or title of a patent holder |