CN104517605B - A kind of sound bite splicing system and method for phonetic synthesis - Google Patents

A kind of sound bite splicing system and method for phonetic synthesis Download PDF

Info

Publication number
CN104517605B
CN104517605B CN201410734257.XA CN201410734257A CN104517605B CN 104517605 B CN104517605 B CN 104517605B CN 201410734257 A CN201410734257 A CN 201410734257A CN 104517605 B CN104517605 B CN 104517605B
Authority
CN
China
Prior art keywords
sound bite
sound
point
sampling point
slope
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410734257.XA
Other languages
Chinese (zh)
Other versions
CN104517605A (en
Inventor
刘青松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unisound Intelligent Technology Co Ltd
Original Assignee
Beijing Yunzhisheng Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yunzhisheng Information Technology Co Ltd filed Critical Beijing Yunzhisheng Information Technology Co Ltd
Priority to CN201410734257.XA priority Critical patent/CN104517605B/en
Publication of CN104517605A publication Critical patent/CN104517605A/en
Application granted granted Critical
Publication of CN104517605B publication Critical patent/CN104517605B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of sound bite splicing system and method for phonetic synthesis, first, two sound bites to be spliced are extracted from sound bank as the first sound bite and the second sound bite, and optimum sampling point is selected from the first sound bite and the second sound bite;Then, it is smooth that single order is carried out to optimum sampling point, generates voice joint point;Single order smoothing method is:Calculate the slope k at optimum sampling point U1, U2a、kb, and optimum sampling point U1, U2 numerical value difference value deltaU;According to slope ka、kbWith difference value deltaUIt is predicted, generates voice joint point.Finally, voice joint point is inserted between the first sound bite and the second sound bite, generates the 3rd sound bite.The present invention solves the problems, such as the voice spectrum saltus step that direct splicing occurs in the prior art, and the problem of smoothing method amount of calculation that adds up again is excessive is searched by auto-correlation, the frequency spectrum for making stitching portion by the smooth method of single order obtains good continuity, enhances user's auditory perception.

Description

A kind of sound bite splicing system and method for phonetic synthesis
Technical field
It is more particularly to a kind of for the sound bite splicing system of phonetic synthesis and side the present invention relates to phonetic synthesis field Method.
Background technology
Existing voice synthetic method has based on speech characteristic parameter and based on two methods of waveform concatenation.Relative to based on ginseng Several methods, the phonetic synthesis based on waveform concatenation can obtain the higher synthesis voice of quality, and sound sounds also more natural, More close to the tone color of original transcription people.Therefore, the online phonetic synthesis of main flow is all to bias toward to spell using based on waveform at present The phonetic synthesis scheme connect.
Phoneme synthesizing method principle based on waveform concatenation is:First selected from the sound bank for prerecording and completing mark Then suitable voice unit obtains final synthesis language as sound bite to be spliced by the splicing between sound bite Sound.Using this joining method, if the fragment of splicing is bad in junction processing, saltus step occurs on frequency spectrum, will lead Family of applying is unnatural on auditory perception.Therefore a crucial technical problem is:Which type of caused using joining method The sound bite for completing splicing is capable of the output of smoothness.
Current existing joining method is using the smooth method that added up again after first being alignd to sound bite, this splicing The sound bite smooth effect of method output is general, the problem of saltus step between sound bite frequency spectrum be present.In addition, in certain situation Under, the problem of can not find smooth alignment point be present in this joining method.From user's sense of hearing, it may appear that the high frequency explosion of ' ' sound Sound, the auditory perception of user can be influenceed.Therefore, it is necessary to a kind of sound bite splicing side for the sound bite that can export smoothness Method.
The content of the invention
The technical problems to be solved by the invention are to provide a kind of sound bite for the sound bite that can export smoothness and spelled Connect method.
The technical scheme that the present invention solves above-mentioned technical problem is as follows:A kind of sound bite for phonetic synthesis splices system System, including sound bank, Samples selecting module, voice joint point generation module and concatenation module;
The sound bank, it is used to store the sound bite for recording and completing mark;
The Samples selecting module, its be used to extracting from sound bank two sound bites to be spliced respectively as First sound bite and the second sound bite, and select optimum sampling from first sound bite and the second sound bite Point;
The voice joint point generation module, it is used for, generation voice joint point smooth to optimum sampling point progress single order;
The concatenation module, it is used to insert voice joint point between the first sound bite and the second sound bite, raw Into the 3rd sound bite.
The beneficial effects of the invention are as follows:Solve and mobile cumulative smoothing method appearance is searched again by the cycle in the prior art Voice spectrum saltus step the problem of, frequency spectrum of the voice in stitching portion is obtained good continuity by the smooth method of single order, Also enhance user's auditory perception.In addition, single order smooth registration method is when searching stitching position candidate's sampled point, it is not necessary to counts The auto-correlation of voice signal is calculated, accurately stitching position is found so as to simpler, greatly reduces amount of calculation, improve fortune Scanning frequency degree.
On the basis of above-mentioned technical proposal, the present invention also makes following improvement.
Further, the Samples selecting module includes search unit and screening unit;
The search unit, it is used to scan for obtaining at least two to first sound bite and the second sound bite Individual candidate's sampled point;
The screening unit, it is used for from least two candidate's sampled points the optimum sampling for filtering out the first sound bite Point U1 and the second sound bite optimum sampling point U2.
Further, the voice joint point generation module includes computing unit and predicting unit;
The computing unit, it is used to calculate the slope k at the optimum sampling point U1aAt the optimum sampling point U2 Slope kb, and the difference value delta of optimum sampling point U1 numerical value and optimum sampling point U2 numerical valueU
The predicting unit, it is used for according to slope ka, slope kbWith difference value deltaUIt is predicted, generation voice is spelled Contact.
Further, the searcher of use is scanned in the search unit to the first sound bite and the second sound bite Formula is bidirectional research, and the first sound bite is using way of search from back to front, and the second sound bite is using searching from front to back Rope mode.
Further, carrying out the condition that candidate's sampled point that the bidirectional research is drawn meets is:
Condition one, the first sound bite and the second sound bite are less than setting in the difference of the absolute value of candidate's sampled point slope Threshold value Tk, i.e. abs (ka-kb)<Tk
Condition two, the first sound bite and the second sound bite are less than adjustable in the absolute value of the difference of candidate's sampling point value Parameter ratio and the first sound bite are in the product of the absolute value of candidate's sampled point slope, i.e. abs (Sa-Sb)<ratio*abs (ka)。
Further, screening optimum sampling point uses minimal error cost criterion, and minimal error cost is slope difference cost With the weighting sum of the different cost of numerical difference, i.e. U*=argmin (w1*Dratio+w2*Dval), wherein, w1For optimum sampling point U*Place Slope cost weighting weight, w2For optimum sampling point U*The weighting weight of numerical value difference cost, DratioFor optimum sampling point U* The slope difference function at place, DvalFor optimum sampling point U*Numerical value difference function.
In order to solve the above-mentioned technical problem, the present invention also provides a kind of sound bite joining method for phonetic synthesis, Comprise the following steps,
Step 1:Two sound bites to be spliced are extracted from sound bank respectively as the first sound bite and second Sound bite, and select optimum sampling point from first sound bite and the second sound bite;
Step 2:It is smooth that single order is carried out to optimum sampling point, generates voice joint point;
Step 3:Voice joint point is inserted between the first sound bite and the second sound bite, generates the 3rd voice sheet Section.
Further, the step 1 specifically,
101:Two sound bites to be spliced are extracted from sound bank respectively as the first sound bite and the second language Tablet section;
102:First sound bite and the second sound bite are scanned for obtain at least two candidate's sampled points;
103:The optimum sampling point U1 and the second voice of the first sound bite are filtered out from least two candidate's sampled points The optimum sampling point U2 of fragment.
Further, the step 2 specifically,
201:Calculate the slope k at the optimum sampling point U1aWith the slope k of the optimum sampling point U2b, and most preferably The difference value delta of sampled point U1 numerical value and optimum sampling point U2 numerical valueU
202:According to slope ka, slope kbWith difference value deltaUIt is predicted, generates voice joint point.
Further, the searcher of use is scanned for described in step 102 to the first sound bite and the second sound bite Formula is bidirectional research, and the first sound bite is using way of search from back to front, and the second sound bite is using searching from front to back Rope mode, carrying out the condition that candidate's sampled point that the bidirectional research is drawn meets is:
Condition one, the first sound bite and the second sound bite are less than setting in the difference of the absolute value of candidate's sampled point slope Threshold value, i.e. abs (ka-kb)<Tk
Condition two, the first sound bite and the second sound bite are less than adjustable in the absolute value of the difference of candidate's sampling point value Parameter ratio and the first sound bite are in the product of the absolute value of candidate's sampled point slope, i.e. abs (Sa-Sb)<ratio*abs (ka)。
Brief description of the drawings
Fig. 1 is a kind of sound bite splicing system modular structure schematic diagram for phonetic synthesis of the invention;
Fig. 2 is that a kind of sound bite splicing system for phonetic synthesis of the invention carries out bidirectional research side to sound bite To schematic diagram;
Fig. 3 is a kind of sound bite joining method flow chart of steps for phonetic synthesis of the invention.
In accompanying drawing, the list of parts representated by each label is as follows:
1st, sound bank, 2, Samples selecting module, 3, voice joint point generation module,
4th, concatenation module, 21, search unit, 22, screening unit,
31st, computing unit, 32, predicting unit.
Embodiment
The principle and feature of the present invention are described below in conjunction with accompanying drawing, the given examples are served only to explain the present invention, and It is non-to be used to limit the scope of the present invention.
Fig. 1 is a kind of sound bite splicing system modular structure schematic diagram for phonetic synthesis of the invention, such as Fig. 1 institutes Show, a kind of sound bite splicing system for phonetic synthesis, including sound bank 1, Samples selecting module 2, voice joint point Generation module 3 and concatenation module 4;Sound bank 1 stores the sound bite for recording and completing mark;Sound bite in sound bank 1 Quantity is at least two.Samples selecting module, two sound bites for extracting to be spliced from sound bank 1 are made respectively For the first sound bite and the second sound bite, and select from first sound bite and the second sound bite and most preferably adopt Sampling point.Voice joint point generation module, it is smooth for carrying out single order to optimum sampling point, generate voice joint point;Concatenation module, For voice joint point to be inserted between the first sound bite and the second sound bite, the 3rd sound bite is generated.
Samples selecting module 2 includes:Search unit 21 and screening unit 22, voice joint point generation module 3 include meter Calculate unit 31 and predicting unit 32.
Search unit 21 is used to the first sound bite and the second sound bite are scanned for obtaining at least two candidates to adopt Sampling point;For two sound bites to be spliced, the last period sound bite is referred to as the first sound bite, latter section of voice sheet Section is referred to as the second sound bite.
As shown in Fig. 2 the way of search for scanning for using to the first sound bite and the second sound bite is searched to be two-way Rope, the first sound bite is using way of search from back to front, and the second sound bite is using way of search from front to back.Carry out Candidate's sampled point that bidirectional research is drawn needs to meet two conditions:
abs(ka-kb)<TkCondition one
abs(Sa-Sb)<ratio*abs(ka) condition two
Condition one, the first sound bite and the second sound bite are less than setting in the difference of the absolute value of candidate's sampled point slope Threshold value Tk.Wherein, kaIt is the first sound bite in the slope of candidate's sampling point position, kbAdopted for the second sound bite in candidate The slope of sampling point position.
Condition two, the first sound bite and the second sound bite are less than adjustable in the absolute value of the difference of candidate's sampling point value The product of parameter ratio and the first sound bite in the absolute value of candidate's sampled point slope.Wherein, SaExist for the first sound bite The numerical value of the sampled point of candidate's sampling point position, SbIt is the second sound bite in the numerical value of candidate's sampled point, kaFor the first voice sheet Section is in the slope of candidate's sampling point position, adjustable parameter ratio control difference value changes sizes.
Meet above-mentioned two condition voice point, candidate's sampled point as splicing simultaneously.Fixing the first sound bite While candidate's sampled point, search is moved after the second sound bite.One wheel search finishes, before candidate's sampled point of the first sound bite Move, continue next round search.Search end condition is searches out alternative splicing candidate's sampled point and the first sound bite and the Two sound bites are moved to higher limit.When search terminates, multiple (at least two) candidate's sampled points can be obtained, and these are waited It is even number to select sampled point number, i.e., the candidate's sampled point collected respectively from the first sound bite and the second sound bite.
After obtaining candidate's sampled point, screening unit 22 filters out the first sound bite from least two candidate's sampled points Optimum sampling point U1 and the second sound bite optimum sampling point U2.
Screen optimum sampling point U*(i.e. U1, U2, U3, U4 ...), sampled using the criterion of minimal error cost from candidate Optimum sampling point U is selected in point*Position as follow-up smooth interpolation.Minimal error cost is optimum sampling point U*Locate slope differences The weighting sum of the different different cost of cost and numerical difference.
U*=argmin (w1*Dratio+w2*Dval)
Wherein, w1For optimum sampling point U*The weighting weight of the slope cost at place, w2For optimum sampling point U*Numerical value difference generation The weighting weight of valency.DratioFor optimum sampling point U*The cost function of the slope difference at place, DvalFor optimum sampling point U*Numerical difference Different cost function.Optimum sampling point U1, U2 are finally drawn according to minimal error cost criterion.
Computing unit 31 calculates the slope k at optimum sampling point U1aWith the slope k at the optimum sampling point U2b, and The difference value delta of optimum sampling point U1 numerical value and optimum sampling point U2 numerical valueU
Predicting unit 32 is according to slope ka, slope kbWith difference value deltaUIt is predicted, generates voice joint point.Prediction Process is:
Slope prediction, if the optimal splice point U1 of the first sound bite is the sampled point at T moment, amplitude size is S, then The sampled point T-1 at T-1 moment amplitude size is ST-1=S-ka, wherein kaFor optimal splice point U1 slope, then can be predicted The sampled point amplitude for going out for the first sound bite T+1 moment isIf the optimal splicing of the second sound bite Point U2 is the sampled point of n-hour, and amplitude size is V, then the sampled point N+1 at N+1 moment amplitude size is VN+1=V+ Kb, wherein kbFor the slope of optimal splice point, then the sampled point amplitude that can be predicted for the second sound bite N-1 moment is
From slope prediction, the first sound bite and the second sound bite exist in respective optimal splice point junction Sampled point forecasted variances
This species diversity cause both can not direct splicing together, therefore, it is necessary to be modified to sampling point value, obtain Going out revised sampling point value is
Final splicing sequence is
…… S-ka S E V V+Kb ………
Because the optimum sampling point described above spliced to the first sound bite and the second sound bite carries out smooth manner Slope information (single order information) is make use of, therefore this smooth manner is single order exponential smoothing.
Fig. 3 is a kind of sound bite joining method flow chart of steps for phonetic synthesis of the invention, as indicated at 3, a kind of For the sound bite joining method of phonetic synthesis, comprise the following steps,
Step 1:Two sound bites to be spliced are extracted from sound bank respectively as the first sound bite and second Sound bite, and select optimum sampling point from first sound bite and the second sound bite;
Step 2:It is smooth that single order is carried out to optimum sampling point, generates voice joint point;
Step 3:Voice joint point is inserted between the first sound bite and the second sound bite, generates the 3rd voice sheet Section.
Step 1 specifically,
101:Two sound bites to be spliced are extracted from sound bank respectively as the first sound bite and the second language Tablet section;
102:First sound bite and the second sound bite are scanned for obtain at least two candidate's sampled points;
103:The optimum sampling point U1 and the second voice of the first sound bite are filtered out from least two candidate's sampled points The optimum sampling point U2 of fragment.
In step 102, the way of search for scanning for using to the first sound bite and the second sound bite is searched to be two-way Rope, the first sound bite are carried out using way of search from back to front, the second sound bite using way of search from front to back The condition that candidate's sampled point that the bidirectional research is drawn meets is:
Condition one, the first sound bite and the second sound bite are less than setting in the difference of the absolute value of candidate's sampled point slope Threshold value, i.e. abs (ka-kb)<Tk
Condition two, the first sound bite and the second sound bite are less than adjustable in the absolute value of the difference of candidate's sampling point value Parameter rat io and the first sound bite are in the product of the absolute value of candidate's sampled point slope, i.e. abs (Sa-Sb)<ratio*abs (ka)。
Meet above-mentioned two condition voice point, candidate's sampled point as splicing simultaneously.Fixing the first sound bite While candidate's sampled point, search is moved after the second sound bite.One wheel search finishes, before candidate's sampled point of the first sound bite Move, continue next round search.Search end condition is searches out alternative splicing candidate's sampled point and the first sound bite and the Two sound bites are moved to higher limit.When search terminates, multiple (at least two) candidate's sampled points can be obtained, and these are waited It is even number to select sampled point number, i.e., the candidate's sampled point collected respectively from the first sound bite and the second sound bite.
In step 103, optimum sampling point U is screened*(i.e. U1, U2, U3, U4 ...), using the criterion of minimal error cost Optimum sampling point U is selected from candidate's sampled point*Position as follow-up smooth interpolation.Minimal error cost is optimum sampling point U*Locate the weighting sum of slope difference cost and the different cost of numerical difference.
U*=argmin (w1*Dratio+w2*Dval)
Wherein, w1For optimum sampling point U*The weighting weight of the slope cost at place, w2For optimum sampling point U*Numerical value difference generation The weighting weight of valency.DratioFor optimum sampling point U*The cost function of the slope difference at place, DvalFor optimum sampling point U*Numerical difference Different cost function.Optimum sampling point U1, U2 are finally drawn according to minimal error cost criterion.
Step 2 specifically,
201:Calculate the slope k at the optimum sampling point U1aWith the slope k of the optimum sampling point U2b, and most preferably The difference value delta of sampled point U1 numerical value and optimum sampling point U2 numerical valueU
202:According to slope ka, slope kbWith difference value deltaUIt is predicted, generates voice joint point.
In step 202, prediction process is:
Slope prediction, if the optimal splice point U1 of the first sound bite is the sampled point at T moment, amplitude size is S, then The sampled point T-1 at T-1 moment amplitude size is ST-1=S-ka, wherein kaFor optimal splice point U1 slope, then can be predicted The sampled point amplitude for going out for the first sound bite T+1 moment is ST+1=S+ka.If the optimal splice point U2 of the second sound bite is The sampled point of n-hour, amplitude size are V, then the sampled point N+1 at N+1 moment amplitude size is VN+1=V+Kb, wherein kb For the slope of optimal splice point, then the sampled point amplitude that can be predicted for the second sound bite N-1 moment is
From slope prediction, the first sound bite and the second sound bite exist in respective optimal splice point junction Sampled point forecasted variances
This species diversity cause both can not direct splicing together, therefore, it is necessary to be modified to sampling point value, obtain Going out revised sampling point value is
Final splicing sequence is
…… S-ka S E V V+Kb ………
The present invention solves to search by the cycle in the prior art moves the voice spectrum jump that cumulative smoothing method occurs again The problem of change, frequency spectrum of the voice in stitching portion is obtained good continuity by the smooth method of single order, also enhance user Auditory perception.In addition, single order smooth registration method is when searching stitching position candidate's sampled point, it is not necessary to calculates voice signal Auto-correlation, stitching position accurately is found so as to simpler, amount of calculation is greatly reduced, improves the speed of service.
The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all the present invention spirit and Within principle, any modification, equivalent substitution and improvements made etc., it should be included in the scope of the protection.

Claims (6)

1. a kind of sound bite splicing system for phonetic synthesis, it is characterised in that including sound bank, Samples selecting mould Block, voice joint point generation module and concatenation module;
The sound bank, it is used to store the sound bite for recording and completing mark;
The Samples selecting module includes search unit and screening unit, wherein, the search unit, it is used for described One sound bite and the second sound bite scan for obtaining at least two candidate's sampled points;
The screening unit, it is used for from least two candidate's sampled points the optimum sampling point U1 for filtering out the first sound bite With the optimum sampling point U2 of the second sound bite;
The voice joint point generation module includes computing unit and predicting unit, wherein, the computing unit, it is used to calculate Slope k at the optimum sampling point U1aWith the slope k at the optimum sampling point U2b, and optimum sampling point U1 numerical value With the difference value delta of optimum sampling point U2 numerical valueU
The predicting unit, it is used for according to slope ka, slope kbWith difference value deltaUIt is predicted, generates voice joint point;
The concatenation module, it is used to insert voice joint point between the first sound bite and the second sound bite, generation the Three sound bites.
A kind of 2. sound bite splicing system for phonetic synthesis according to claim 1, it is characterised in that the search The way of search for scanning for using to the first sound bite and the second sound bite in unit is bidirectional research, the first voice sheet The ways of search of Duan Caiyong from back to front, the second sound bite is using way of search from front to back.
3. a kind of sound bite splicing system for phonetic synthesis according to claim 2, it is characterised in that described in implementation The condition that candidate's sampled point that bidirectional research is drawn meets is:
Condition one, the first sound bite and the second sound bite are less than the threshold of setting in the difference of the absolute value of candidate's sampled point slope Value Tk, i.e. abs (ka-kb)<Tk
Condition two, the first sound bite and the second sound bite are less than adjustable parameter in the absolute value of the difference of candidate's sampling point value Ratio and the first sound bite are in the product of the absolute value of candidate's sampled point slope, i.e. abs (Sa-Sb)<ratio*abs(ka)。
4. a kind of sound bite splicing system for phonetic synthesis according to claim 1, it is characterised in that screening is optimal Sampled point uses minimal error cost criterion, and minimal error cost is sampled point U*The different cost of slope cost and numerical difference at place Weight sum, U*=argmin (w1*Dratio+w2*Dval), wherein, w1For optimum sampling point U*The slope difference cost at place adds Weigh weight, w2For optimum sampling point U*The weighting weight of numerical value difference cost, DratioFor optimum sampling point U*The slope difference letter at place Number, DvalFor optimum sampling point U*Numerical value difference function.
A kind of 5. sound bite joining method for phonetic synthesis, it is characterised in that comprise the following steps,
Step 1:Two sound bites to be spliced are extracted from sound bank respectively as the first sound bite and the second voice Fragment, first sound bite and the second sound bite are scanned for obtain at least two candidate's sampled points, from least two The optimum sampling point U1 of the first sound bite and the optimum sampling point U2 of the second sound bite are filtered out in individual candidate's sampled point;
Step 2:Calculate the slope k at the optimum sampling point U1aWith the slope k of the optimum sampling point U2b, and most preferably adopt The difference value delta of sampling point U1 numerical value and optimum sampling point U2 numerical valueU, according to slope ka, slope kbWith difference value deltaU It is predicted, generates voice joint point;
Step 3:Voice joint point is inserted between the first sound bite and the second sound bite, generates the 3rd sound bite.
6. a kind of sound bite joining method for phonetic synthesis according to claim 5, it is characterised in that in step 1 The way of search for scanning for using to the first sound bite and the second sound bite is bidirectional research, the first sound bite Using way of search from back to front, the second sound bite is carried out the bidirectional research and obtained using way of search from front to back The condition that candidate's sampled point for going out meets is:
Condition one, the first sound bite and the second sound bite are less than the threshold of setting in the difference of the absolute value of candidate's sampled point slope Value, i.e. abs (ka-kb)<Tk
Condition two, the first sound bite and the second sound bite are less than adjustable parameter in the absolute value of the difference of candidate's sampling point value Ratio and the first sound bite are in the product of the absolute value of candidate's sampled point slope, i.e. abs (Sa-Sb)<ratio*abs(ka)。
CN201410734257.XA 2014-12-04 2014-12-04 A kind of sound bite splicing system and method for phonetic synthesis Active CN104517605B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410734257.XA CN104517605B (en) 2014-12-04 2014-12-04 A kind of sound bite splicing system and method for phonetic synthesis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410734257.XA CN104517605B (en) 2014-12-04 2014-12-04 A kind of sound bite splicing system and method for phonetic synthesis

Publications (2)

Publication Number Publication Date
CN104517605A CN104517605A (en) 2015-04-15
CN104517605B true CN104517605B (en) 2017-11-28

Family

ID=52792811

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410734257.XA Active CN104517605B (en) 2014-12-04 2014-12-04 A kind of sound bite splicing system and method for phonetic synthesis

Country Status (1)

Country Link
CN (1) CN104517605B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105679306B (en) * 2016-02-19 2019-07-09 云知声(上海)智能科技有限公司 The method and system of fundamental frequency frame are predicted in speech synthesis
CN108831424B (en) * 2018-06-15 2021-01-08 广州酷狗计算机科技有限公司 Audio splicing method and device and storage medium
CN109389969B (en) * 2018-10-29 2020-05-26 百度在线网络技术(北京)有限公司 Corpus optimization method and apparatus
CN109979440B (en) * 2019-03-13 2021-05-11 广州市网星信息技术有限公司 Keyword sample determination method, voice recognition method, device, equipment and medium
CN112562635B (en) * 2020-12-03 2024-04-09 云知声智能科技股份有限公司 Method, device and system for solving generation of pulse signals at splicing position in speech synthesis
CN112863530A (en) * 2021-01-07 2021-05-28 广州欢城文化传媒有限公司 Method and device for generating sound works
CN112971778A (en) * 2021-02-09 2021-06-18 北京师范大学 Brain function imaging signal obtaining method and device and electronic equipment
CN113421547B (en) * 2021-06-03 2023-03-17 华为技术有限公司 Voice processing method and related equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1333501A (en) * 2001-07-20 2002-01-30 北京捷通华声语音技术有限公司 Dynamic Chinese speech synthesizing method
CN1540624A (en) * 2003-04-25 2004-10-27 阿尔卡特公司 Method of generating speech according to text
CN1731510A (en) * 2004-08-05 2006-02-08 摩托罗拉公司 Text-speech conversion for amalgamated language
JP2008191334A (en) * 2007-02-02 2008-08-21 Oki Electric Ind Co Ltd Speech synthesis method, speech synthesis program, speech synthesis device and speech synthesis system
JP2008299266A (en) * 2007-06-04 2008-12-11 Mitsubishi Electric Corp Speech synthesis device and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1333501A (en) * 2001-07-20 2002-01-30 北京捷通华声语音技术有限公司 Dynamic Chinese speech synthesizing method
CN1540624A (en) * 2003-04-25 2004-10-27 阿尔卡特公司 Method of generating speech according to text
CN1731510A (en) * 2004-08-05 2006-02-08 摩托罗拉公司 Text-speech conversion for amalgamated language
JP2008191334A (en) * 2007-02-02 2008-08-21 Oki Electric Ind Co Ltd Speech synthesis method, speech synthesis program, speech synthesis device and speech synthesis system
JP2008299266A (en) * 2007-06-04 2008-12-11 Mitsubishi Electric Corp Speech synthesis device and method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《基于韵律匹配代价和韵律拼接代价的汉语语音合成》;张鹏 等;《哈尔滨工业大学学报》;20061130;第38卷(第11期);第1节第1段,第4节第2段 *

Also Published As

Publication number Publication date
CN104517605A (en) 2015-04-15

Similar Documents

Publication Publication Date Title
CN104517605B (en) A kind of sound bite splicing system and method for phonetic synthesis
CN102779508B (en) Sound bank generates Apparatus for () and method therefor, speech synthesis system and method thereof
CN104780388B (en) The cutting method and device of a kind of video data
US20180349495A1 (en) Audio data processing method and apparatus, and computer storage medium
US8890869B2 (en) Colorization of audio segments
CN101178896B (en) Unit selection voice synthetic method based on acoustics statistical model
CN110213670A (en) Method for processing video frequency, device, electronic equipment and storage medium
CN109147758A (en) A kind of speaker&#39;s sound converting method and device
CN106157951B (en) Carry out the automatic method for splitting and system of audio punctuate
JP4220449B2 (en) Indexing device, indexing method, and indexing program
CN106021496A (en) Video search method and video search device
CN103700370A (en) Broadcast television voice recognition method and system
CN102723078A (en) Emotion speech recognition method based on natural language comprehension
CN101930747A (en) Method and device for converting voice into mouth shape image
CN110096966A (en) A kind of audio recognition method merging the multi-modal corpus of depth information Chinese
CN109979428B (en) Audio generation method and device, storage medium and electronic equipment
CN106302987A (en) A kind of audio frequency recommends method and apparatus
CN106297765B (en) Phoneme synthesizing method and system
CN108172211B (en) Adjustable waveform splicing system and method
CN103915093A (en) Method and device for realizing voice singing
CN101867742A (en) Television system based on sound control
CN110277087A (en) A kind of broadcast singal anticipation preprocess method
US9666211B2 (en) Information processing apparatus, information processing method, display control apparatus, and display control method
CN107507627B (en) Voice data heat analysis method and system
Felipe et al. Acoustic scene classification using spectrograms

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 100191, Beijing, Huayuan Road, Haidian District No. 2 peony technology building, block A, 5

Patentee after: Yunzhisheng Intelligent Technology Co., Ltd.

Address before: 100191, Beijing, Huayuan Road, Haidian District No. 2 peony technology building, block A, 5

Patentee before: Beijing Yunzhisheng Information Technology Co., Ltd.

CP01 Change in the name or title of a patent holder