CN103474075B - Voice signal sending method and system, method of reseptance and system - Google Patents
Voice signal sending method and system, method of reseptance and system Download PDFInfo
- Publication number
- CN103474075B CN103474075B CN201310362024.7A CN201310362024A CN103474075B CN 103474075 B CN103474075 B CN 103474075B CN 201310362024 A CN201310362024 A CN 201310362024A CN 103474075 B CN103474075 B CN 103474075B
- Authority
- CN
- China
- Prior art keywords
- unit
- model
- synthesis unit
- synthesis
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Telephonic Communication Services (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of voice signal sending method and system, this sending method comprises determining that the content of text that continuous speech signal to be sent is corresponding;The phonetic synthesis parameter model of each synthesis unit is determined according to described content of text;The phonetic synthesis parameter model splicing each synthesis unit obtains phonetic synthesis parameter model sequence;Determine the sequence number string that described phonetic synthesis parameter model sequence pair is answered;Described sequence number string is sent to receiving terminal, so that described receiving terminal recovers described continuous speech signal according to described sequence number string.The invention also discloses a kind of voice signal method of reseptance and system.Utilize the present invention, the signal transmission of extremely low rate bit stream on the premise of recovering tonequality minimization of loss ensureing voice, can be realized.
Description
Technical field
The present invention relates to signal transmission technology field, be specifically related to a kind of voice signal sending method and system,
And a kind of voice signal method of reseptance and system.
Background technology
Along with the universal of the Internet and the popularization of portable set, various chat softwares based on handheld device should
Transport and give birth to.The Natural humanity of interactive voice is that other interactive meanses are unsurpassable, is particularly being unfavorable for
In the hand-held little screen equipment application of hand-written key-press input.These a lot of products are all supported voice interactive function, will
The transmitting voice signal that certain terminal receives i.e. supports Voice to destination, the micro-news product released such as Tengxun
The voice message transmission function of Message.But the voice signal data amount of directly transmission is often very big,
The Internet or communication network etc. bring bigger financial burden by the channel of flow charging to user.The most such as
The data volume compressing transmission on the premise of the most not affecting voice quality as far as possible is to improve transmitting voice signal
The precondition of using value.
For the problem of transmitting voice signal, research worker has attempted multiple voice coded method, believes voice
Number carry out digital quantization and compression transmission, under the conditions of improving the recovery words matter of voice signal, reduce encoder bit rate
And promote efficiency of transmission.The most conventional speech signal compression method has waveform coding and parameter coding etc..Its
In:
Waveform coding be by the analog signal waveform of time domain through over sampling, quantify, encode, form digital signal,
This coded system has advantage adaptable, that speech quality is high.But owing to needs keep recovering original
The waveform shape of voice signal, this scheme rate bit stream requires higher, could obtain preferably higher than 16kb/s
Tonequality.
Parameter coding i.e. extracts the parameter characterizing sound pronunciation feature from primary speech signal, and to this feature
Parameter encodes.The meaning of one's words aiming at holding raw tone of this scheme, it is ensured that intelligibility.It is excellent
Point is that rate bit stream is relatively low, but it is impaired more to recover tonequality.
In traditional voice communication epoch, often use time-based charging mode, coded method primary concern algorithm
Time delay and communication quality;And in the mobile interchange epoch, voice, as the one of data signal, generally uses stream
Amount is collected the charges, and the height of encoded voice rate bit stream will directly affect the cost that user uses.Additionally, pass
System telephone channel voice only uses 8k sample rate, belongs to narrowband speech, and tonequality is impaired and there is the upper limit.Obviously
If being continuing with tradition coded system to process broadband or ultra broadband voice, needing to increase rate bit stream, carrying at double
Rise flow consumption.
Summary of the invention
On the one hand the embodiment of the present invention provides a kind of voice signal sending method and system, is ensureing that voice recovers
The signal transmission of extremely low rate bit stream is realized on the premise of tonequality minimization of loss.
On the other hand the embodiment of the present invention provides a kind of voice signal method of reseptance and system, extensive to reduce voice
Complex tone matter is lost.
To this end, the present invention provides following technical scheme:
A kind of voice signal sending method, including:
Determine the content of text that continuous speech signal to be sent is corresponding;
The phonetic synthesis parameter model of each synthesis unit is determined according to described content of text;
The phonetic synthesis parameter model splicing each synthesis unit obtains phonetic synthesis parameter model sequence;
Determine the sequence number string that described phonetic synthesis parameter model sequence pair is answered;
Described sequence number string is sent to receiving terminal, so that described receiving terminal recovers described company according to described sequence number string
Continuous voice signal.
A kind of voice signal sends system, including:
Text acquisition module, for determining the content of text that continuous speech signal to be sent is corresponding;
Parameter model determines module, for determining the phonetic synthesis ginseng of each synthesis unit according to described content of text
Digital-to-analogue type;
Concatenation module, obtains phonetic synthesis parameter mould for splicing the phonetic synthesis parameter model of each synthesis unit
Type sequence;
Sequence number string determines module, for determining the sequence number string that described phonetic synthesis parameter model sequence pair is answered;
Sending module, for being sent to receiving terminal by described sequence number string, so that described receiving terminal is according to described sequence
Number string recover described continuous speech signal.
The voice signal sending method of embodiment of the present invention offer and system, use Statistic analysis models coding,
Its processing mode is unrelated with speech sample rate, on the premise of ensureing voice recovery tonequality minimization of loss greatly
Reduce transmission code flow rate, decrease flow consumption, solve traditional voice coded method and can not take into account sound
Matter and the problem of flow, the telex network demand under mobile network's epoch that improves is experienced.
Correspondingly, the voice signal method of reseptance of embodiment of the present invention offer and system, recipient is according to reception
To the sequence number string answered of phonetic synthesis parameter model sequence pair from code book, obtain phonetic synthesis parameter model sequence
Row, utilize this sequence to obtain voice signal by phonetic synthesis mode, greatly reduce voice and recover tonequality damage
Lose, it is achieved that the huge compression of voice signal and minimizing of the loss of signal.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present application or technical scheme of the prior art, below will be to enforcement
In example, the required accompanying drawing used is briefly described, it should be apparent that, the accompanying drawing in describing below is only
Some embodiments described in the present invention, for those of ordinary skill in the art, it is also possible to according to these
Accompanying drawing obtains other accompanying drawing.
Fig. 1 is the flow chart of embodiment of the present invention voice signal sending method;
Fig. 2 is a kind of flow process of the phonetic synthesis parameter model determining each synthesis unit in the embodiment of the present invention
Figure;
Fig. 3 is the structure flow chart of binary decision tree in the embodiment of the present invention;
Fig. 4 is the schematic diagram of a kind of binary decision tree in the embodiment of the present invention;
Fig. 5 is the another kind of stream of the phonetic synthesis parameter model determining each synthesis unit in the embodiment of the present invention
Cheng Tu;
Fig. 6 is the flow chart of embodiment of the present invention voice signal method of reseptance;
Fig. 7 is the structured flowchart that in the embodiment of the present invention, voice signal sends system;
Fig. 8 is the structured flowchart that in the embodiment of the present invention, parameter model determines module;
Fig. 9 is the structured flowchart that in the embodiment of the present invention, binary decision tree builds module;
Figure 10 is that in the embodiment of the present invention, in voice signal transmission system, fundamental frequency model determines a kind of knot of unit
Structure block diagram;
Figure 11 is a kind of knot that in the embodiment of the present invention, voice signal transmission system intermediate frequency spectrum model determines unit
Structure block diagram;
Figure 12 is that in the embodiment of the present invention, in voice signal transmission system, fundamental frequency model determines the another kind of unit
Structured flowchart;
Figure 13 is the another kind that in the embodiment of the present invention, voice signal transmission system intermediate frequency spectrum model determines unit
Structured flowchart;
Figure 14 is the structured flowchart that embodiment of the present invention voice signal receives system.
Detailed description of the invention
In order to make those skilled in the art be more fully understood that the scheme of the embodiment of the present invention, below in conjunction with the accompanying drawings
With embodiment, the embodiment of the present invention is described in further detail.
Processing broadband or ultra broadband voice for tradition coded system, need to increase rate bit stream, flow consumes big
Problem, the embodiment of the present invention provides a kind of voice signal sending method and system, and a kind of voice signal
Method of reseptance and system, it is adaptable to various types of voice (ultra broadband voice, the 8KHz such as 16KHz sample rate
The narrowband speech etc. of sample rate) coding, ensureing that voice recovers on the premise of tonequality minimization of loss, real
The signal transmission of existing extremely low rate bit stream.
As it is shown in figure 1, be the flow chart of embodiment of the present invention voice signal sending method, comprise the following steps:
Step 101, determines the content of text that continuous speech signal to be sent is corresponding.
Specifically, described content of text can automatically be obtained by speech recognition algorithm, naturally it is also possible to pass through
The mode of artificial mark obtains described content of text.It addition, in order to be further ensured that what speech recognition obtained
The correctness of content of text, it is also possible to the content of text obtaining speech recognition carries out human-edited's correction.
Step 102, determines the phonetic synthesis parameter model of each synthesis unit according to described content of text.
Described synthesis unit is minimum synthetic object set in advance, such as syllable unit, phoneme unit, even
It it is the state cell etc. in phoneme HMM model.
Recover the loss of tonequality in order to reduce receiving terminal as far as possible, enable receiving terminal extensive by phonetic synthesis mode
Multiply-connected continuous voice signal, the phonetic synthesis parameter model that transmitting terminal obtains from primary speech signal should be as far as possible
Meet primitive tone signal feature, to reduce the loss of Signal Compression and recovery.
Specifically, according to described content of text, continuous speech signal can be carried out voice snippet cutting, obtain
The voice snippet that each synthesis unit is corresponding, and then obtain duration corresponding to each synthesis unit, fundamental frequency model and frequency
Spectrum model, detailed process will be described in detail later.
Step 103, the phonetic synthesis parameter model splicing each synthesis unit obtains phonetic synthesis parameter model sequence
Row.
Step 104, determines the sequence number string that described phonetic synthesis parameter model sequence pair is answered.
Step 105, is sent to receiving terminal by described sequence number string, so that described receiving terminal is according to described sequence number string
Recover described continuous speech signal.
Embodiment of the present invention voice signal sending method, use Statistic analysis models coding, its processing mode with
Speech sample rate is unrelated, to 16kHz ultra broadband voice coding without paying additional code flow rate cost, its sound
Matter is effective, and coding flow is low.As a example by one section of typical Chinese speech fragment, its efficient voice section continues
10s, has 80 sound mothers (phoneme), has 5 fundamental frequency states, 5 frequency spectrum shapes with each phoneme
State, 1 time long status meter, every state use 1 byte code (8bit), its rate bit stream is m:m=
[80* (5+5+1)] * 8bit/10s=704b/s, less than 1kb/s, belongs to ELF magnetic field coded method, rate bit stream
Being significantly less than every coding standard in current main-stream speech communication field, the flow of network communication will drop significantly
Low.Comparing the communications field voice coding method of current main-stream, the voice coding modes of the inventive method can be located
Reason ultra broadband voice (16kHz sample rate), tonequality is higher;And there is lower rate bit stream (below 1kb/s),
Effectively reduce network traffic.
As in figure 2 it is shown, be the one of the phonetic synthesis parameter model determining each synthesis unit in the embodiment of the present invention
Plant flow chart, comprise the following steps:
Step 201, carries out voice snippet cutting according to content of text to continuous speech signal, obtains each synthesis
The voice snippet that unit is corresponding.
Specifically, can be by acoustics corresponding with synthesis unit in described content of text for described continuous speech signal
Model sequence does pressure alignment, i.e. calculates the voice signal speech recognition solution corresponding to described acoustic model sequence
Code, thus obtain the sound bite that each synthesis unit is corresponding.
It should be noted that described synthesis unit can select different size according to different application demands.
In general, if requiring higher to rate bit stream, then bigger voice unit is selected, such as syllable unit, sound
Element unit etc.;If otherwise tonequality is required higher, then can select less voice unit, such as the shape of model
State unit, feature stream unit etc..
Using acoustic model based on HMM (Hidden Markov Model, hidden Markov model)
Arrange down, also can choose each state of HMM model further as synthesis unit, and obtain corresponding base
Voice snippet in state layer.Subsequently to each state respectively from fundamental frequency binary decision tree and the frequency spectrum of its correspondence
Binary decision tree determines fundamental frequency model corresponding to each state and spectral model.So can make the language of acquisition
Sound synthetic parameters model can describe the feature of voice signal more meticulously.
Step 202, obtains the current synthesis unit investigated.
Step 203, the sound bite duration that the current synthesis unit investigated of statistics is corresponding.
Step 204, determines the fundamental frequency model of the synthesis unit of current investigation.
Specifically, the fundamental frequency binary decision tree that the current synthesis unit investigated is corresponding is first obtained;To described conjunction
Unit is become to carry out text resolution, it is thus achieved that the contextual information of described synthesis unit, such as, phoneme unit, tune
Property, the inferior contextual information of part of speech, fascicule;Then, according to described contextual information at described fundamental frequency two
Fork tree carries out path decision, obtains the leaf node of correspondence, by fundamental frequency model corresponding for described leaf node
Fundamental frequency model as described synthesis unit.
Specifically, the process carrying out path decision is as follows:
According to the contextual information of described synthesis unit, start to depend on from the root node of described fundamental frequency binary decision tree
Secondary each node split problem is answered;A top-down coupling path is obtained according to answering result;
Leaf node is obtained according to described coupling path.
Step 205, determines the spectral model of the synthesis unit of current investigation.
Specifically, the fundamental frequency binary decision tree that the current synthesis unit investigated is corresponding is first obtained;To described conjunction
Unit is become to carry out text resolution, it is thus achieved that the contextual information of described synthesis unit, such as, phoneme unit, tune
Property, the inferior contextual information of part of speech, fascicule;Then, according to described contextual information, at described frequency spectrum
Binary decision tree carries out path decision, obtains the leaf node of correspondence, by frequency corresponding for described leaf node
Spectrum model is as the spectral model of described synthesis unit.
Specifically, the process carrying out path decision is as follows:
According to the contextual information of described synthesis unit, start to depend on from the root node of described frequency spectrum binary decision tree
Secondary each node split problem is answered;A top-down coupling path is obtained according to answering result;
Leaf node is obtained according to described coupling path.
Step 206, it is judged that whether the current synthesis unit investigated is last synthesis unit.If it is,
Then perform step 207;Otherwise, step 202 is performed.
Step 207, exports sound bite duration, fundamental frequency model and spectral model that each synthesis unit is corresponding.
The quality of the phonetic synthesis parameter model that synthesis unit is corresponding and binary decision tree (include that fundamental frequency y-bend is certainly
Plan tree and frequency spectrum binary decision tree) structure have direct relation.In embodiments of the present invention, use from
Lower and on clustering method build binary decision tree.
As it is shown on figure 3, be the structure flow chart of binary decision tree in the embodiment of the present invention, comprise the following steps:
Step 301, obtains training data.
Specifically, substantial amounts of voice training data can be gathered and it is carried out text marking, then according to mark
The content of text of note carries out basic voice unit or even synthesis unit (the state list such as basic speech unit models
Unit) voice snippet cutting, obtain the voice snippet set that each synthesis unit is corresponding, and by single for each synthesis
Voice snippet in the voice snippet set that unit is corresponding is as training data corresponding to this synthesis unit.
Step 302, extracts the synthesis ginseng of voice snippet set corresponding to synthesis unit from described training data
Number.
Described synthetic parameters includes: fundamental frequency feature and spectrum signature etc..
Step 303, at the beginning of carrying out the binary decision tree that described synthesis unit is corresponding according to the synthetic parameters extracted
Beginningization, and root node is set as currently investigating node.
Described binary decision tree is initialized the binary decision tree i.e. building only root node.
Step 304, it is judged that current node of investigating is the need of division.If it is, perform step 305;
Otherwise perform step 306.
The residue problem selected in default problem set carries out division trial to the current data investigating node, obtains
Take child node.Described residue problem refers to the problem do not inquired.
Specifically, can first calculate the current sample concentration class investigating node, i.e. describe voice snippet set
The degree of scatter of interior sample.In general, degree of scatter is the biggest, then the probability of this node split is described more
Greatly, the probability otherwise divided is the least.Specifically can use sample variance to weigh the sample concentration class of node,
I.e. calculate the average of the distance (or square) at all sample distance-like centers under this node.Then division is calculated
The sample concentration class of rear child node, and select there is the problem of maximum sample concentration class fall as preferably
Problem.
Then carry out division according to described optimal selection problem to attempt, obtain child node.If preferably asked according to described
The concentration class of topic division is dropped by less than the threshold value set, or in the child node after division, training data is less than most
The thresholding set, it is determined that current node of investigating does not continues to division.
Step 305, divides current node of investigating, and obtains the child node after division and described sub-joint
The training data that point is corresponding.Then, step 307 is performed.
Specifically, according to described optimal selection problem, current node of investigating can be divided.
Step 306, is leaf node by currently investigating vertex ticks.
Step 307, it is judged that whether also have the non-leaf nodes do not investigated in described binary decision tree.If
It is then to perform step 308;Otherwise perform step 309.
Step 308, obtains the next non-leaf nodes do not investigated as currently investigating node.Then,
Return step 304.
Step 309, exports binary decision tree.
It should be noted that in embodiments of the present invention, fundamental frequency binary decision tree and frequency spectrum binary decision tree are all
Can set up according to flow process shown in Fig. 3.
As shown in Figure 4, it is the schematic diagram of a kind of binary decision tree in the embodiment of the present invention.
Fig. 4 illustrates phoneme " *-aa+ " the structure figure of binary decision tree of the 3rd state.Such as Fig. 4 institute
Show, when root node divides according to can be by the answer of default problem " whether right adjacent phoneme is rhinophonia "
The training data that root node is corresponding splits, subsequently when next node layer divides, during as left sibling is divided,
According to can be by instruction corresponding for described node to the answer of default problem " whether left adjacent phoneme is voiced consonant "
Practice data to split further.Last set when node cannot split further its as leaf node, and utilize
It is corresponding that training data training obtains mathematical statistical model, such as Gauss model, by this statistics model
As the synthetic parameters model that current leaf node is corresponding.
Obviously, in the embodiment depicted in figure 2, selecting of phonetic synthesis parameter model depends on based on literary composition
The binary decision tree of this analysis, as by the phoneme class of synthesis unit context of current investigation, current sound
The pronunciation type etc. of element.So sorting speech synthetic parameters model is convenient and swift, but defeated to special sound signal
Entering, this phonetic synthesis parameter model with universality determines that method cannot embody pronunciation characteristic well.
To this end, Fig. 5 shows the phonetic synthesis parameter model determining each synthesis unit in the embodiment of the present invention
Another kind of flow chart, comprises the following steps:
Step 501, carries out voice snippet cutting according to content of text to continuous speech signal, obtains each synthesis
The voice snippet that unit is corresponding.
Specifically, the acoustic model that described continuous speech signal is corresponding with the synthesis unit preset can be run business into strong one
System alignment, i.e. calculates the voice signal speech recognition decoder corresponding to described acoustic model sequence, thus obtains
The sound bite that each synthesis unit is corresponding.
It should be noted that described synthesis unit can select different size according to different application demands.
In general, if requiring higher to rate bit stream, then bigger voice unit is selected, such as syllable unit, sound
Element unit etc.;If otherwise tonequality is required higher, then can select less voice unit, such as the shape of model
State unit, feature stream unit etc..
Using acoustic model based on HMM (Hidden Markov Model, hidden Markov model)
Arrange down, also can choose each state of HMM model further as synthesis unit, and obtain corresponding base
Voice snippet in state layer.Subsequently to each state respectively from fundamental frequency binary decision tree and the frequency spectrum of its correspondence
Binary decision tree determines fundamental frequency model corresponding to each state and spectral model.So can make the language of acquisition
Sound synthetic parameters model can describe the feature of voice signal more meticulously.
Step 502, determines duration and the described continuous speech letter of the voice snippet that each synthesis unit is corresponding
Number corresponding fundamental frequency characteristic sequence and spectrum signature sequence.
Step 503, true according to the fundamental frequency model set that described fundamental frequency characteristic sequence and described synthesis unit are corresponding
The fundamental frequency model of fixed described synthesis unit.
Specifically, it is determined that the fundamental frequency characteristic sequence that described synthesis unit is corresponding, and obtain described synthesis unit pair
The fundamental frequency model set answered, all leaf nodes of the fundamental frequency binary decision tree of the most described synthesis unit are corresponding
Fundamental frequency model.Then described fundamental frequency characteristic sequence is calculated with each fundamental frequency model in described fundamental frequency model set seemingly
So degree, and select the fundamental frequency model with maximum likelihood degree as the fundamental frequency model of described synthesis unit.
Step 504, true according to the spectral model set that described spectrum signature sequence and described synthesis unit are corresponding
The spectral model of fixed each synthesis unit.
Specifically, it is determined that the spectrum signature sequence that described synthesis unit is corresponding, and obtain described synthesis unit pair
The spectral model set answered, all leaf nodes of the frequency spectrum binary decision tree of the most described synthesis unit are corresponding
Spectral model.Then described spectrum signature sequence is calculated with each spectral model in described spectral model set seemingly
So degree, and select the spectral model with maximum likelihood degree as the spectral model of described synthesis unit.
Visible, the voice signal sending method of the embodiment of the present invention, ensureing that voice recovers tonequality loss reduction
Significantly reduce transmission code flow rate on the premise of change, decrease flow consumption, solve traditional voice coding
Method can not take into account the problem of tonequality and flow, and the telex network demand under mobile network's epoch that improves is experienced.
Correspondingly, the embodiment of the present invention also provides for a kind of voice signal method of reseptance, as shown in Figure 6, is this
The flow chart of method, comprises the following steps:
Step 601, receives the sequence number string that phonetic synthesis parameter model sequence pair is answered.
Step 602, obtains phonetic synthesis parameter model sequence according to described sequence number string from code book.
Owing to each phonetic synthesis parameter model has a unique sequence number, and, in sender and reception
Side all preserves identical code book, contains all phonetic synthesis parameter models in described code book.Therefore, connect
Debit can obtain the phonetic synthesis parameter model of corresponding each sequence number according to the sequence number string received from code book, spells
Connect these phonetic synthesis parameter models and obtain described phonetic synthesis parameter model sequence.
Step 603, determines phonetic synthesis argument sequence according to described phonetic synthesis parameter model sequence.
Specifically, can be according to the described phonetic synthesis parameter model sequence duration sequence corresponding with synthesis unit
Determine phonetic synthesis parameter, generate phonetic synthesis argument sequence.
Such as, phonetic synthesis argument sequence is obtained according to below equation:
Omax=arg maxP (O |, λ, T)
Wherein, O is argument sequence, and λ is given phonetic synthesis parameter model sequence, and T is that each synthesis is single
The duration sequence that unit is corresponding.
OmaxThe base frequency parameters sequence i.e. ultimately generated or frequency spectrum parameter sequence, unit duration sequence T's
In the range of, ask for the parameter with maximum likelihood value corresponding to given phonetic synthesis parameter model sequence λ
Sequence Omax, thus obtain the argument sequence for phonetic synthesis.
Step 604, recovers voice signal according to described phonetic synthesis argument sequence.
The phonetic synthesis argument sequence O that upper step is obtainedmaxSend into the i.e. available corresponding voice of voice operation demonstrator.
Voice operation demonstrator is that the analysis of a kind of voice signal recovers instrument, can be by parameterized speech data (such as base
Frequently parameter, frequency spectrum parameter) recover high-quality speech waveform.
Visible, embodiment of the present invention voice signal sending method and method of reseptance, by continuous speech signal
The extraction of corresponding phonetic synthesis parameter model and signal syntheses, it is achieved that the huge compression of voice signal and letter
Number loss minimize, i.e. efficiently reduce distorted signals.
Correspondingly, the embodiment of the present invention also provides for a kind of voice signal and sends system, as it is shown in fig. 7, be this
The structured flowchart of system.
In this embodiment, described voice signal transmission system includes:
Text acquisition module 701, for determining the content of text that continuous speech signal to be sent is corresponding;
Parameter model determines module 702, for determining that according to described content of text the voice of each synthesis unit closes
Become parameter model;
Concatenation module 703, obtains phonetic synthesis ginseng for splicing the phonetic synthesis parameter model of each synthesis unit
Number Model sequence;
Sequence number string determines module 704, for determining the sequence number string that described phonetic synthesis parameter model sequence pair is answered;
Sending module 705, for being sent to receiving terminal by described sequence number string, so that described receiving terminal is according to institute
State sequence number string and recover described continuous speech signal.
In actual applications, above-mentioned text acquisition module 701 can obtain institute automatically by speech recognition algorithm
State content of text, naturally it is also possible to by the way of artificial mark, obtain described content of text.To this end, can
To arrange voice recognition unit and/or markup information acquiring unit in text acquisition module 701, in order to can
So that user selects different modes to obtain the content of text that continuous speech signal to be sent is corresponding.Wherein,
By speech recognition algorithm, described voice recognition unit, for determining that continuous speech signal to be sent is corresponding
Content of text;Described markup information acquiring unit is to be sent continuous for obtaining by the way of artificial mark
The content of text that voice signal is corresponding.
Described synthesis unit is minimum synthetic object set in advance, such as syllable unit, phoneme unit, even
It it is the state cell etc. in phoneme HMM model.
Recover the loss of tonequality in order to reduce receiving terminal as far as possible, enable receiving terminal extensive by phonetic synthesis mode
Multiply-connected continuous voice signal, parameter model determines the phonetic synthesis ginseng that module 702 obtains from primary speech signal
Digital-to-analogue type should meet primitive tone signal feature as far as possible, to reduce the loss of Signal Compression and recovery.Specifically,
According to described content of text, continuous speech signal can be carried out voice snippet cutting, obtain each synthesis unit pair
The voice snippet answered, and then obtain duration corresponding to each synthesis unit, fundamental frequency model and spectral model.
Embodiment of the present invention voice signal send system, use Statistic analysis models coding, its processing mode with
Speech sample rate is unrelated, to 16kHz ultra broadband voice coding without paying additional code flow rate cost, its sound
Matter is effective, and coding flow is low.Compare the communications field speech coding system of current main-stream, present system
Voice coding modes can process ultra broadband voice (16kHz sample rate), tonequality is higher;And have lower
Rate bit stream (below 1kb/s), effectively reduce network traffic.
As shown in Figure 8, it is a kind of structured flowchart that in the embodiment of the present invention, parameter model determines module.
Described parameter model determines that module includes:
Cutting unit 801, for carrying out voice snippet according to described content of text to described continuous speech signal
Cutting, obtains the voice snippet that each synthesis unit is corresponding.
Specifically, can be by acoustic model corresponding with synthesis unit in described content of text for continuous speech signal
Sequence does pressure alignment, i.e. calculates the voice signal speech recognition decoder corresponding to described acoustic model sequence,
Thus obtain the sound bite that each synthesis unit is corresponding.
It should be noted that described synthesis unit can select different size according to different application demands.
In general, if requiring higher to rate bit stream, then bigger voice unit is selected, such as syllable unit, sound
Element unit etc.;If otherwise tonequality is required higher, then can select less voice unit, such as the shape of model
State unit, feature stream unit etc..Using based on HMM (Hidden Markov Model, hidden Ma Erke
Husband's model) acoustic model arrange down, also can choose each state of HMM model further single as synthesis
Unit, and obtain corresponding voice snippet based on state layer.Subsequently to each state respectively from the base of its correspondence
Binary decision tree and frequency spectrum binary decision tree determine fundamental frequency model corresponding to each state and spectral model frequently.
The phonetic synthesis parameter model that so can enable acquisition describes the feature of voice signal more meticulously.
Duration determines unit 802, is used for the duration of the voice snippet determining that each synthesis unit is corresponding successively.
Fundamental frequency model determines unit 803, is used for the fundamental frequency of the voice snippet determining that each synthesis unit is corresponding successively
Model.
Spectral model determines unit 804, is used for the frequency spectrum of the voice snippet determining that each synthesis unit is corresponding successively
Model.
In actual applications, above-mentioned fundamental frequency model determines that unit 803 and spectral model determine that unit 804 is permissible
There is multiple implementation, such as, fundamental frequency model and spectral model can be obtained according to binary decision tree, for
This, in voice signal of the present invention sends another embodiment of system, described system also includes binary decision tree
Build module, be used for building fundamental frequency binary decision tree and frequency spectrum binary decision tree.It addition, above-mentioned fundamental frequency model
Determine that unit 803 and spectral model determine that unit 804 is also based on signal characteristic optimization to obtain fundamental frequency mould
Type and spectral model, will be described in detail later this.
As it is shown in figure 9, be that in the embodiment of the present invention, in voice signal transmission system, binary decision tree builds module
Structured flowchart.
Described binary decision tree builds module and includes:
Training data acquiring unit 901, is used for obtaining training data;
Parameter extraction unit 902, for extracting the voice that described synthesis unit is corresponding from described training data
The synthetic parameters of segment set, described synthetic parameters includes: fundamental frequency feature and spectrum signature;
Initialization unit 903, for the Binary decision corresponding to described synthesis unit according to described synthetic parameters
Tree initializes, and i.e. builds the binary decision tree of only root node;
Node reviews unit 904, for from the beginning of the root node of described binary decision tree, investigates each successively
Non-leaf nodes;If currently investigating node to need division, then current node of investigating is divided, and obtain
Take the child node after division and training data corresponding to described child node;Otherwise, vertex ticks will currently be investigated
For leaf node;
Binary decision tree output unit 905, for examining all non-leaf nodes at described node reviews unit
After having examined, export the binary decision tree of described synthesis unit.
In this embodiment, training data acquiring unit 901 specifically can gather substantial amounts of voice training data
And it is carried out text marking, then carry out basic voice unit according to the content of text of mark or even synthesis is single
The voice snippet cutting of unit's (such as state cell of basic speech unit models), obtains each synthesis unit corresponding
Voice snippet set, and using the voice snippet in voice snippet set corresponding for each synthesis unit as this
The training data that synthesis unit is corresponding.
Above-mentioned node reviews unit 904, can be according to working as when judging current investigation node the need of division
The sample concentration class of front investigation node, selects have the problem of maximum sample concentration class fall as preferably
Problem carries out division and attempts, and obtains child node.If declining little according to the concentration class that described optimal selection problem divides
In child node after the threshold value set, or division, training data is less than the thresholding set, it is determined that when
Front investigation node does not continues to division.
Above-mentioned investigation and fission process can refer to retouching in above embodiment of the present invention voice signal sending method
State, do not repeat them here.
It should be noted that in embodiments of the present invention, fundamental frequency binary decision tree and frequency spectrum binary decision tree are all
Can be built module by this binary decision tree to set up, it is similar that it realizes process, the most detailed at this
Explanation.
Based on above-mentioned fundamental frequency binary decision tree and frequency spectrum binary decision tree, the present invention is described in detail further below
In embodiment, fundamental frequency model determines that unit and spectral model determine the implementation of unit.
As shown in Figure 10, it is that in the embodiment of the present invention, in voice signal transmission system, fundamental frequency model determines unit
A kind of structured flowchart.
In this embodiment, described fundamental frequency model determines that unit includes:
First acquiring unit 161, for obtaining the fundamental frequency binary decision tree that described synthesis unit is corresponding.
First resolution unit 162, for carrying out text resolution to described synthesis unit, it is thus achieved that described synthesis list
The contextual information of unit, such as, the inferior contextual information of phoneme unit, tonality, part of speech, fascicule.
First decision package 163, for carrying out road according to described contextual information in described fundamental frequency binary tree
Footpath decision-making, obtains the leaf node of correspondence.
Specifically, the process carrying out path decision is as follows: according to the contextual information of described synthesis unit, from
The root node of described fundamental frequency binary decision tree starts to answer each node split problem successively;According to answer
Result obtains a top-down coupling path;Leaf node is obtained according to described coupling path.
First output unit 164, for single as described synthesis using fundamental frequency model corresponding for described leaf node
The fundamental frequency model of unit.
Similar with the realization that above-mentioned fundamental frequency model determines unit, as shown in figure 11, it is in the embodiment of the present invention
Voice signal sends system intermediate frequency spectrum model and determines a kind of structured flowchart of unit.
In this embodiment, described spectral model determines that unit includes:
Second acquisition unit 171, for obtaining the frequency spectrum binary decision tree that described synthesis unit is corresponding.
Second resolution unit 172, for described synthesis unit is carried out text resolution, it is thus achieved that its phoneme unit,
Tonality, part of speech, the inferior contextual information of fascicule, such as, phoneme unit, tonality, part of speech, fascicule
Inferior contextual information.
Second decision package 173, for the contextual information according to described synthesis text, at described frequency spectrum two
Fork tree carries out path decision, obtains the leaf node of correspondence.
Specifically, the process carrying out path decision is as follows: according to the contextual information of described synthesis unit, from
The root node of described frequency spectrum binary decision tree starts to answer each node split problem successively;According to answer
Result obtains a top-down coupling path;Leaf node is obtained according to described coupling path.
Second output unit 174, using spectral model corresponding for described leaf node as described synthesis unit
Spectral model.
It should be noted that in actual applications, the fundamental frequency model shown in above-mentioned Figure 10 determines unit and figure
Spectral model shown in 11 determines that unit can be realized by the most independent physical location respectively, it is also possible to
Unification is realized by a physical location.When needs generate fundamental frequency model, obtain the base that synthesis unit is corresponding
Frequently binary decision tree, and synthesis unit is resolved and decision-making accordingly, obtain corresponding described synthesis unit
Fundamental frequency model.When needs generate spectral model, obtain the frequency spectrum binary decision tree that synthesis unit is corresponding,
And synthesis unit is resolved and decision-making accordingly, obtain the spectral model of corresponding described synthesis unit.
As shown in figure 12, it is that in the embodiment of the present invention, in voice signal transmission system, fundamental frequency model determines unit
Another kind of structured flowchart.
In this embodiment, described fundamental frequency model determines that unit includes:
First determines unit 181, for determining the fundamental frequency characteristic sequence that described synthesis unit is corresponding.
First set acquiring unit 182, for obtaining the fundamental frequency model set that described synthesis unit is corresponding, i.e.
The fundamental frequency model that all leaf nodes of the fundamental frequency binary decision tree of described synthesis unit are corresponding.
First computing unit 183, is used for calculating described fundamental frequency characteristic sequence each with described fundamental frequency model set
The likelihood score of fundamental frequency model.
First selects unit 184, for selecting the fundamental frequency model with maximum likelihood degree single as described synthesis
The fundamental frequency model of unit.
Similar with the realization that above-mentioned fundamental frequency model determines unit, Figure 13 is voice signal in the embodiment of the present invention
Transmission system intermediate frequency spectrum model determines the another kind of structured flowchart of unit.
In this embodiment, described spectral model determines that unit includes:
Second determines unit 191, for determining the spectrum signature sequence that described synthesis unit is corresponding.
Second set acquiring unit 192, for obtaining the spectral model set that described synthesis unit is corresponding, i.e.
The spectral model that all leaf nodes of the fundamental frequency binary decision tree of described synthesis unit are corresponding.
Second computing unit 193, is used for calculating described spectrum signature sequence each with described spectral model set
The likelihood score of spectral model.
Second selects unit 194, for selecting the spectral model with maximum likelihood degree single as described synthesis
The spectral model of unit.
It should be noted that in actual applications, the fundamental frequency model shown in above-mentioned Figure 12 determines unit and figure
Spectral model shown in 13 determines that unit can be realized by the most independent physical location respectively, it is also possible to
Unification is realized by a physical location.When needs generate fundamental frequency model, obtain the base that synthesis unit is corresponding
Frequently binary decision tree, and synthesis unit is resolved and decision-making accordingly, obtain corresponding described synthesis unit
Fundamental frequency model.When needs generate spectral model, obtain the frequency spectrum binary decision tree that synthesis unit is corresponding,
And synthesis unit is resolved and decision-making accordingly, obtain the spectral model of corresponding described synthesis unit.
Visible, the voice signal of the embodiment of the present invention sends system, is ensureing that voice recovers tonequality loss reduction
Significantly reduce transmission code flow rate on the premise of change, decrease flow consumption, solve traditional voice coding
Method can not take into account the problem of tonequality and flow, and the telex network demand under mobile network's epoch that improves is experienced.
Correspondingly, the embodiment of the present invention also provides for a kind of voice signal and receives system, as shown in figure 14, is
The structured flowchart of this system.
In this embodiment, described voice signal reception system includes:
Receiver module 141, for receiving the sequence number string that phonetic synthesis parameter model sequence pair is answered;
Extraction module 142, for obtaining phonetic synthesis parameter model sequence according to described sequence number string from code book;
Determine module 143, for determining phonetic synthesis parameter sequence according to described phonetic synthesis parameter model sequence
Row;
Signal recovers module 144, for recovering voice signal according to described phonetic synthesis argument sequence.
Above-mentioned determine that module 143 can continue according to described phonetic synthesis parameter model sequence and Model sequence
Duration determines phonetic synthesis parameter, generates phonetic synthesis ginseng sequence.The process of implementing can refer to above this
Description in bright embodiment voice signal method of reseptance, does not repeats them here.
Due in the embodiment of the present invention voice signal reception system recovery of voice signal and speech sample rate without
Close, therefore, it can at the letter ensureing to realize on the premise of voice recovers tonequality minimization of loss extremely low rate bit stream
Number transmission, preferably solve tonequality and the problems of liquid flow of traditional voice coded method, improve mobile network
Under epoch, telex network demand is experienced, and has saved network charges.
The voice signal of the embodiment of the present invention sends and reception scheme goes for various types of voice (as 16k adopts
The ultra broadband voice of sample rate, the narrowband speech etc. of 8k sample rate) coding, and available preferable tonequality.
Each embodiment in this specification all uses the mode gone forward one by one to describe, phase homophase between each embodiment
As part see mutually, what each embodiment stressed is different from other embodiments it
Place.For system embodiment, owing to it is substantially similar to embodiment of the method, so describing
Fairly simple, relevant part sees the part of embodiment of the method and illustrates.System described above is implemented
Example is only that schematically the wherein said unit illustrated as separating component can be or may not be
Physically separate, the parts shown as unit can be or may not be physical location, the most permissible
It is positioned at a place, or can also be distributed on multiple NE.Can select according to the actual needs
Some or all of module therein realizes the purpose of the present embodiment scheme.Those of ordinary skill in the art exist
In the case of not paying creative work, i.e. it is appreciated that and implements.
Being described in detail the embodiment of the present invention above, detailed description of the invention used herein is to this
Bright being set forth, the explanation of above example is only intended to help to understand the method and apparatus of the present invention;With
Time, for one of ordinary skill in the art, according to the thought of the present invention, in detailed description of the invention and application
All will change in scope, in sum, this specification content should not be construed as limitation of the present invention.
Claims (20)
1. a voice signal sending method, it is characterised in that including:
Determine the content of text that continuous speech signal to be sent is corresponding;
The phonetic synthesis parameter model of each synthesis unit is determined according to described content of text;
The phonetic synthesis parameter model splicing each synthesis unit obtains phonetic synthesis parameter model sequence;
Determine the sequence number string that described phonetic synthesis parameter model sequence pair is answered;
Described sequence number string is sent to receiving terminal, so that described receiving terminal recovers described company according to described sequence number string
Continuous voice signal.
Method the most according to claim 1, it is characterised in that described determine continuous language to be sent
Content of text corresponding to tone signal includes:
The content of text that continuous speech signal to be sent is corresponding is determined by speech recognition algorithm;Or
Obtain, by the way of artificial mark, the content of text that continuous speech signal to be sent is corresponding.
Method the most according to claim 1, it is characterised in that described true according to described content of text
The phonetic synthesis parameter model of fixed each synthesis unit includes:
According to described content of text, described continuous speech signal is carried out voice snippet cutting, obtain each synthesis single
The voice snippet that unit is corresponding;
Determine the duration of the voice snippet that each synthesis unit is corresponding, fundamental frequency model and spectral model successively.
Method the most according to claim 3, it is characterised in that described determine that synthesis unit is corresponding
Fundamental frequency model includes:
Obtain the fundamental frequency binary decision tree that described synthesis unit is corresponding;
Described synthesis unit is carried out text resolution, it is thus achieved that the contextual information of described synthesis unit;
In described fundamental frequency binary tree, carry out path decision according to described contextual information, obtain the leaf of correspondence
Node;
Using fundamental frequency model corresponding for described leaf node as the fundamental frequency model of described synthesis unit.
Method the most according to claim 3, it is characterised in that described determine that synthesis unit is corresponding
Spectral model includes:
Obtain the frequency spectrum binary decision tree that described synthesis unit is corresponding;
Described synthesis unit is carried out text resolution, it is thus achieved that it includes phoneme unit, tonality, part of speech, the rhythm
The contextual information of level;
According to the contextual information of described synthesis unit, in described frequency spectrum binary tree, carry out path decision,
To corresponding leaf node;
Using spectral model corresponding for described leaf node as the spectral model of described synthesis unit.
6. according to the method described in claim 4 or 5, it is characterised in that described method also includes: press
The binary decision tree that the in the following manner described synthesis unit of structure is corresponding:
Obtain training data;
The synthetic parameters of voice snippet set corresponding to described synthesis unit, institute is extracted from described training data
State synthetic parameters to include: fundamental frequency feature and spectrum signature;
According to described synthetic parameters, the binary decision tree that described synthesis unit is corresponding is initialized;
From the beginning of the root node of described binary decision tree, investigate each non-leaf nodes successively;
If currently investigating node to need division, then current node of investigating is divided, and after obtaining division
Child node and training data corresponding to described child node;Otherwise, by currently investigate vertex ticks be leaf joint
Point;
After all non-leaf nodes have been investigated, obtain the binary decision tree of described synthesis unit.
Method the most according to claim 3, it is characterised in that described determine that synthesis unit is corresponding
Fundamental frequency model includes:
Determine the fundamental frequency characteristic sequence that described synthesis unit is corresponding;
Obtain the fundamental frequency model set that described synthesis unit is corresponding;
Calculate described fundamental frequency characteristic sequence and the likelihood score of each fundamental frequency model in described fundamental frequency model set;
Select the fundamental frequency model with maximum likelihood degree as the fundamental frequency model of described synthesis unit.
Method the most according to claim 3, it is characterised in that described determine that synthesis unit is corresponding
Spectral model includes:
Determine the spectrum signature sequence that described synthesis unit is corresponding;
Obtain the spectral model set that described synthesis unit is corresponding;
Calculate described spectrum signature sequence and the likelihood score of each spectral model in described spectral model set;
Select the spectral model with maximum likelihood degree as the spectral model of described synthesis unit.
9. a voice signal method of reseptance, it is characterised in that including:
Receive the sequence number string that phonetic synthesis parameter model sequence pair is answered;
From code book, phonetic synthesis parameter model sequence is obtained according to described sequence number string;
Phonetic synthesis argument sequence is determined according to described phonetic synthesis parameter model sequence;
Voice signal is recovered according to described phonetic synthesis argument sequence.
Method the most according to claim 9, it is characterised in that described join according to described phonetic synthesis
Number Model sequence determines that phonetic synthesis argument sequence includes:
Phonetic synthesis parameter is determined according to described phonetic synthesis parameter model sequence and Model sequence duration,
Generate phonetic synthesis ginseng sequence.
11. 1 kinds of voice signals send system, it is characterised in that including:
Text acquisition module, for determining the content of text that continuous speech signal to be sent is corresponding;
Parameter model determines module, for determining the phonetic synthesis ginseng of each synthesis unit according to described content of text
Digital-to-analogue type;
Concatenation module, obtains phonetic synthesis parameter mould for splicing the phonetic synthesis parameter model of each synthesis unit
Type sequence;
Sequence number string determines module, for determining the sequence number string that described phonetic synthesis parameter model sequence pair is answered;
Sending module, for being sent to receiving terminal by described sequence number string, so that described receiving terminal is according to described sequence
Number string recover described continuous speech signal.
12. systems according to claim 11, it is characterised in that described text acquisition module includes:
By speech recognition algorithm, voice recognition unit, for determining that continuous speech signal to be sent is corresponding
Content of text;Or
Markup information acquiring unit, for obtaining continuous speech signal to be sent by the way of artificial mark
Corresponding content of text.
13. systems according to claim 11, it is characterised in that described parameter model determines module
Including:
Cutting unit, cuts for described continuous speech signal being carried out voice snippet according to described content of text
Point, obtain the voice snippet that each synthesis unit is corresponding;
Duration determines unit, is used for the duration of the voice snippet determining that each synthesis unit is corresponding successively;
Fundamental frequency model determines unit, is used for the fundamental frequency model of the voice snippet determining that each synthesis unit is corresponding successively
Spectral model determines unit, is used for the frequency spectrum mould of the voice snippet determining that each synthesis unit is corresponding successively
Type.
14. systems according to claim 13, it is characterised in that described fundamental frequency model determines unit
Including:
First acquiring unit, for obtaining the fundamental frequency binary decision tree that described synthesis unit is corresponding;
First resolution unit, for carrying out text resolution to described synthesis unit, it is thus achieved that described synthesis unit
Contextual information;
First decision package, for carrying out path certainly according to described contextual information in described fundamental frequency binary tree
Plan, obtains the leaf node of correspondence;
First output unit, is used for fundamental frequency model corresponding for described leaf node as described synthesis unit
Fundamental frequency model.
15. systems according to claim 13, it is characterised in that described spectral model determines unit
Including:
Second acquisition unit, for obtaining the frequency spectrum binary decision tree that described synthesis unit is corresponding;
Second resolution unit, for described synthesis unit is carried out text resolution, it is thus achieved that it includes phoneme unit,
Tonality, part of speech, the contextual information of rhythm level;
Second decision package, for the contextual information according to described synthesis unit, described frequency spectrum binary tree
In carry out path decision, obtain correspondence leaf node;
Second output unit, is used for spectral model corresponding for described leaf node as described synthesis unit
Spectral model.
16. according to the system described in claims 14 or 15, it is characterised in that described system also includes:
Binary decision tree builds module, and described binary decision tree builds module and includes:
Training data acquiring unit, is used for obtaining training data;
Parameter extraction unit, for extracting the voice snippet that described synthesis unit is corresponding from described training data
The synthetic parameters of set, described synthetic parameters includes: fundamental frequency feature and spectrum signature;
Initialization unit, for entering the binary decision tree that described synthesis unit is corresponding according to described synthetic parameters
Row initializes;
Node reviews unit, for from the beginning of the root node of described binary decision tree, investigates each n omicronn-leaf successively
Child node;If current node of investigating needs division, then current node of investigating is divided, and acquisition point
Child node after splitting and training data corresponding to described child node;Otherwise, it is leaf by currently investigating vertex ticks
Child node;
Binary decision tree output unit, for having investigated all non-leaf nodes at described node reviews unit
Cheng Hou, exports the binary decision tree of described synthesis unit.
17. systems according to claim 13, it is characterised in that described fundamental frequency model determines unit
Including:
First determines unit, for determining the fundamental frequency characteristic sequence that described synthesis unit is corresponding;
First set acquiring unit, for obtaining the fundamental frequency model set that described synthesis unit is corresponding;
First computing unit, is used for calculating described fundamental frequency characteristic sequence and each fundamental frequency in described fundamental frequency model set
The likelihood score of model;
First selects unit, has the fundamental frequency model of maximum likelihood degree as described synthesis unit for selection
Fundamental frequency model.
18. systems according to claim 13, it is characterised in that described spectral model determines unit
Including:
Second determines unit, for determining the spectrum signature sequence that described synthesis unit is corresponding;
Second set acquiring unit, for obtaining the spectral model set that described synthesis unit is corresponding;
Second computing unit, is used for calculating described spectrum signature sequence and each frequency spectrum in described spectral model set
The likelihood score of model;
Second selects unit, has the spectral model of maximum likelihood degree as described synthesis unit for selection
Spectral model.
19. 1 kinds of voice signals receive system, it is characterised in that including:
Receiver module, for receiving the sequence number string that phonetic synthesis parameter model sequence pair is answered;
Extraction module, for obtaining phonetic synthesis parameter model sequence according to described sequence number string from code book;
Determine module, for determining phonetic synthesis argument sequence according to described phonetic synthesis parameter model sequence;
Signal recovers module, for recovering voice signal according to described phonetic synthesis argument sequence.
20. systems according to claim 19, it is characterised in that
Described determine module, specifically for continuing according to described phonetic synthesis parameter model sequence and Model sequence
Duration determines phonetic synthesis parameter, generates phonetic synthesis ginseng sequence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310362024.7A CN103474075B (en) | 2013-08-19 | 2013-08-19 | Voice signal sending method and system, method of reseptance and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310362024.7A CN103474075B (en) | 2013-08-19 | 2013-08-19 | Voice signal sending method and system, method of reseptance and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103474075A CN103474075A (en) | 2013-12-25 |
CN103474075B true CN103474075B (en) | 2016-12-28 |
Family
ID=49798896
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310362024.7A Active CN103474075B (en) | 2013-08-19 | 2013-08-19 | Voice signal sending method and system, method of reseptance and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103474075B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106373581A (en) * | 2016-09-28 | 2017-02-01 | 成都奥克特科技有限公司 | Data encoding processing method for speech signals |
CN108346423B (en) * | 2017-01-23 | 2021-08-20 | 北京搜狗科技发展有限公司 | Method and device for processing speech synthesis model |
CN108389592B (en) * | 2018-02-27 | 2021-10-08 | 上海讯飞瑞元信息技术有限公司 | Voice quality evaluation method and device |
CN111147444B (en) * | 2019-11-20 | 2021-08-06 | 维沃移动通信有限公司 | Interaction method and electronic equipment |
CN116469405A (en) * | 2023-04-23 | 2023-07-21 | 富韵声学科技(深圳)有限公司 | Noise reduction conversation method, medium and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0360265A2 (en) * | 1988-09-21 | 1990-03-28 | Nec Corporation | Communication system capable of improving a speech quality by classifying speech signals |
CN1256001A (en) * | 1998-01-27 | 2000-06-07 | 松下电器产业株式会社 | Method and device for coding lag parameter and code book preparing method |
CN1321297A (en) * | 1999-08-23 | 2001-11-07 | 松下电器产业株式会社 | Voice encoder and voice encoding method |
CN1486486A (en) * | 2000-11-27 | 2004-03-31 | 日本电信电话株式会社 | Method, device and program for coding and decoding acoustic parameter, and method, device and program for coding and decoding sound |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008139631A (en) * | 2006-12-04 | 2008-06-19 | Nippon Telegr & Teleph Corp <Ntt> | Voice synthesis method, device and program |
-
2013
- 2013-08-19 CN CN201310362024.7A patent/CN103474075B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0360265A2 (en) * | 1988-09-21 | 1990-03-28 | Nec Corporation | Communication system capable of improving a speech quality by classifying speech signals |
CN1256001A (en) * | 1998-01-27 | 2000-06-07 | 松下电器产业株式会社 | Method and device for coding lag parameter and code book preparing method |
CN1321297A (en) * | 1999-08-23 | 2001-11-07 | 松下电器产业株式会社 | Voice encoder and voice encoding method |
CN1486486A (en) * | 2000-11-27 | 2004-03-31 | 日本电信电话株式会社 | Method, device and program for coding and decoding acoustic parameter, and method, device and program for coding and decoding sound |
Also Published As
Publication number | Publication date |
---|---|
CN103474075A (en) | 2013-12-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103474075B (en) | Voice signal sending method and system, method of reseptance and system | |
CN101447185B (en) | Audio frequency rapid classification method based on content | |
CN101510424B (en) | Method and system for encoding and synthesizing speech based on speech primitive | |
CN102723078B (en) | Emotion speech recognition method based on natural language comprehension | |
CN103700370B (en) | A kind of radio and television speech recognition system method and system | |
CN103065620B (en) | Method with which text input by user is received on mobile phone or webpage and synthetized to personalized voice in real time | |
CN108053823A (en) | A kind of speech recognition system and method | |
CN103761975B (en) | Method and device for oral evaluation | |
CN102496364A (en) | Interactive speech recognition method based on cloud network | |
CN102446504B (en) | Voice/Music identifying method and equipment | |
CN106453043A (en) | Multi-language conversion-based instant communication system | |
CN102568469B (en) | G.729A compressed pronunciation flow information hiding detection device and detection method | |
CN103474067B (en) | speech signal transmission method and system | |
CN107564533A (en) | Speech frame restorative procedure and device based on information source prior information | |
CN106356054A (en) | Method and system for collecting information of agricultural products based on voice recognition | |
CN109036387A (en) | Video speech recognition methods and system | |
CN112420079B (en) | Voice endpoint detection method and device, storage medium and electronic equipment | |
WO2019119552A1 (en) | Method for translating continuous long speech file, and translation machine | |
CN103077705B (en) | Method for optimizing local synthesis based on distributed natural rhythm | |
CN101814289A (en) | Digital audio multi-channel coding method and system of DRA (Digital Recorder Analyzer) with low bit rate | |
CN102314878A (en) | Automatic phoneme splitting method | |
CN108010533A (en) | The automatic identifying method and device of voice data code check | |
CN113724690B (en) | PPG feature output method, target audio output method and device | |
CN111312211A (en) | Dialect speech recognition system based on oversampling technology | |
CN109192197A (en) | Big data speech recognition system Internet-based |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: Wangjiang Road high tech Development Zone Hefei city Anhui province 230088 No. 666 Applicant after: Iflytek Co., Ltd. Address before: Wangjiang Road high tech Development Zone Hefei city Anhui province 230088 Building No. 666 Xunfei Applicant before: Anhui USTC iFLYTEK Co., Ltd. |
|
COR | Change of bibliographic data | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |