CN102024453B

CN102024453B - Singing sound synthesis system, method and device

Info

Publication number: CN102024453B
Application number: CN2009101694254A
Authority: CN
Inventors: 李幸辑; 李宏儒; 王文男; 徐志浩; 张智星
Original assignee: Institute for Information Industry
Current assignee: Institute for Information Industry
Priority date: 2009-09-09
Filing date: 2009-09-09
Publication date: 2012-05-23
Anticipated expiration: 2029-09-09
Also published as: CN102024453A

Abstract

The invention discloses a singing sound synthesis system, which comprises a storage unit, a beat unit, an input unit and a processing unit, wherein the storage unit is used for storing at least one melody; the beat unit is used for indicating a beat; the input unit is used for receiving a plurality of sound signals; and the processing unit is used for processing the sound signals and generating a synthesized singing sound signal. The sound signals are generated by reading aloud or humming by a user according to the melody and the beat, so each sound signal corresponds to the melody and the beat and is processed directly, and the time and the cost of pre-recording a large number of user corpuses are saved, the system resources are saved, and the song synthesis speed is improved; furthermore, the finally obtained synthesized singing sound has tones of users, and the effect is quite lifelike.

Description

Song synthesis system, method and device

Technical field

The present invention relates to a kind of song synthetic technology, relate in particular to a kind of song synthesis system, device and method that can produce the true song of plan.

Background technology

In recent years, along with the development of Information technology is ripe gradually, the processing power that computing electronics possessed also significantly promotes, and makes the application of many complicacies be achieved, and one of them is voice or the synthetic correlation technique of song.Generally speaking; Phonetic synthesis can be made a general reference for producing the technology near true man's voice with manual type; Existing many at present related application exist, and for example: virtual singer, electronic pet, white silk are sung software, composer and singer's emulation combination etc., and its corresponding demand is also cumulative day by day.And on conventional architectures; As shown in Figure 1; General voice, song synthetic method must be prerecorded true man's speech data to set up corpus (Corpus Database) 20, and as the foundation of changing between literal and the voice, wherein the input of language material can be divided into the input of single syllable language material (Single-Syllable-based Corpus) 21 again with this; With Chinese is Chinese single syllable such as example: ㄅ, ㄆ, ㄇ; Also have the input of words language material (Coarticulation-basedCorpus) 22, as: tomorrow, day after tomorrow or the like, and the input of song words and phrases language material (Song-based Corpus) 23.

Fig. 1 is for showing the process flow diagram of traditional song synthetic method.At first, musical instrument digital interface (Musical Instrument Digital Interface, MIDI) file and the lyrics data of the selected song of input; Wherein this musical instrument digital interface literature kit contains the music score (score) of selected song; Comprise information such as beat and note,, carry out words cutting (Word Segmentation) according to the musical instrument digital interface file of being imported and lyrics data and obtain voice label (Phonetic Label) at step S101; Carrying out words at step S102 then derives; From corpus 20, pick out the language material that meets most, then at the step S103 adjustment duration of a sound (duration) and pitch (pitch), last; Carry out being connected with smoothing processing, adding echo effect, accompaniment music between sound and the sound at step S103, and the song that obtains synthesizing.Yet but there is disadvantage in above-mentioned conventional art:

(1) set up corpus and need expend and carry out recording of language material for a long time, and corpus needs huge storage area.

(2) words derivation program is complicated, needs labor system resource, and the problem of words miscut takes place easily.

(3) with Chinese language, the poor effect that song is synthetic has sounded tangible mechanical sound.

(4) be subject to the corpus of pre-recording, can only output fixing tone color, then must record corpus again if will change tone color.

(5) global procedures is complicated, and it is longer to produce synthetic song required time, can't obtain synthetic song in real time.

Therefore, generally speaking, traditional song synthetic method can't satisfy general user's demand on the cost, on the efficient and on the fluency of synthetic song.

Summary of the invention

The object of the present invention is to provide a kind of intuitive song synthesis system, method and device, let the user needn't have the knack of music theory or be good at and sing, if with spoken mode according to the beat input audio signal, can obtain having the song of individual tone color.

Song synthesis system provided by the present invention comprises a storage element, a beat unit, an input block and a processing unit.Storage element is in order to store at least one melody; The beat unit is in order to point out a beat according to a specific melody in above-mentioned at least one melody; Input block is in order to receive a plurality of voice signals, wherein the corresponding above-mentioned specific melody of tut signal; Processing unit is in order to produce a synthetic singing voice signals according to above-mentioned specific melody and tut signal.

Song synthetic method provided by the present invention is applicable to a computing electronics, and its step comprises according to a melody points out a beat; A radio reception module that sees through above-mentioned computing electronics receives a plurality of voice signals, wherein the corresponding above-mentioned specific melody of tut signal; Produce a synthetic singing voice signals according to above-mentioned specific melody and tut signal, and export above-mentioned synthetic singing voice signals through a broadcast module of above-mentioned computing electronics.

Song synthesizer provided by the present invention comprises a housing, a reservoir, a beat mechanism, a radio reception device and a processor.Reservoir is arranged at above-mentioned enclosure interior, is connected to above-mentioned processor, stores at least one melody; Beat mechanism is arranged at above-mentioned outside, is connected to above-mentioned processor, points out a beat according to a specific melody in above-mentioned at least one melody; Be arranged at above-mentioned outside by the radio reception device, be connected to above-mentioned processor, receive a plurality of voice signals, wherein the corresponding above-mentioned specific melody of tut signal; And processor is arranged at above-mentioned enclosure interior, produces a synthetic singing voice signals according to above-mentioned specific melody and tut signal.

The voice signal person of being to use in the embodiment of the invention according to this melody, beat chant or hum produce; Therefore each voice signal corresponds to this melody and beat thereof respectively; Can directly this voice signal be handled, save the time and the cost of a large amount of user's corpus that need prerecord in a large number in the prior art, reach the effect of saving system resource and quickening the song aggregate velocity; And the final synthetic song that obtains has more user's tone color, and effect is quite intended true.

Description of drawings

Accompanying drawing described herein is used to provide further understanding of the present invention, constitutes the application's a part, does not constitute qualification of the present invention.In the accompanying drawings:

Fig. 1 is the process flow diagram according to the synthetic described song synthetic method of framework of traditional voice.

Fig. 2 is the Organization Chart according to the described song synthesizer of one embodiment of the invention.

Fig. 3 is according to the described phonetic entry error detecting of one embodiment of the invention synoptic diagram.

Fig. 4 is the pitch adjustment synoptic diagram according to the synchronous method of superposition of the said use primitive period of one embodiment of the invention.

Fig. 5 is the pitch adjustment synoptic diagram according to the said use intersection of one embodiment of the invention repercussion.

Fig. 6 A, 6B are the pitch adjustment synoptic diagram according to the said use resample method of one embodiment of the invention.

Fig. 7 A, 7B, 7C are the smoothing processing synoptic diagram according to the said use Bezier curve of one embodiment of the invention.

Fig. 8 is the process flow diagram according to the described song synthetic method of one embodiment of the invention.

Fig. 9 A, 9B, 9C, 9D are the process flow diagram of described according to other embodiments of the present invention song synthetic method.

Figure 10 is the Organization Chart according to the described song synthesizer of one embodiment of the invention.

Drawing reference numeral:

20～corpus;

21～single syllable language material;

22～words language material;

23～song words and phrases language material;

200～song synthesis system;

201～storage element;

202～beat unit;

203～input block;

204～processing unit;

1000～song synthesizer;

1010～shell;

1020～reservoir;

1030～beat mechanism;

1040～radio reception device;

1050～processor.

Embodiment

For make the object of the invention, feature and advantage can be more obviously understandable, hereinafter is special lifts some preferred embodiments, and conjunction with figs. elaborates as follows:

Fig. 2 is the Organization Chart according to the described song synthesis system of one embodiment of the invention.Include storage element 201, beat unit 202, input block 203 and processing unit 204 in the song synthesis system 200.When a song will be carried out song when synthetic; Storage element 201 stores the melody of how first song; The melody that this song can be provided is to beat unit 202, and according to the corresponding beat (tempo) of melody prompting of this song, this beat refers to the bat according to the fixed frequency of this song melody again in beat unit 202; Can assist the user to chant or hum the lyrics of this song with the mode of spoken language; The a plurality of voice signals that produced are chanted or hummed to 203 of input blocks in order to receive above-mentioned user, corresponding above-mentioned this melody of tut signal, and meet this beat.At last, processing unit 204 is handled according to this melody and tut signal again, produces a synthetic singing voice signals.

In certain embodiments, above-mentioned melody can be a sound wave (beat unit 202 can mark the beat of this song through the technology that bat is followed the trail of (beat tracking) for Waveform Audio, WAV) file.And in other embodiments; Above-mentioned melody can be a musical instrument digital interface (Musical Instrument DigitalInterface; MIDI) file, beat unit 202 can directly grasp beat incident (tempo event) data in the musical instrument digital interface file to obtain the beat of this song.And the beat that point out according to melody beat unit 202 can have numerous embodiments, like the visual signal that is produced via a display unit, for example moves, the symbol of jump, flicker or variable color; Or the voice signal for being produced by an output unit, for example imitate metronomic " answer, answer～" sound, or, for example wave, rotate, beat or swing like metronomic balancing point by the beat action that a physical construction is provided; Also or by the flicker of light that a luminescence unit produces, variable color etc.

In certain embodiments; In order to let the rhythm (rhythm) of a plurality of voice signals that the user imported have correctness to a certain degree; Rhythm analytic unit (not illustrating) is after receiving a plurality of voice signals that the user imports; Judge according to the melody of this song whether the set rhythm that this voice signal has surpasses a preset allowable error value, and this rhythm refers to the speed state of each word counter point appearance of the lyrics.If above-mentioned set rhythm surpasses preset allowable error value, then rhythm analytic unit (not illustrating) prompting user repeats the step of above-mentioned input audio signal; This will further describe at Fig. 3 about the running details of judging the rhythm error after a while.Perhaps; Rhythm analytic unit (not illustrating) also can be designed to after receiving a plurality of voice signals that the user imports; Further decide this voice signal output in its sole discretion whether to accept this recorded version by the user again; If do not accept, then provide an operation-interface to operate and select to re-enter a plurality of voice signals, to replace old voice signal for the user.In addition, in other embodiments, the mode that the user also can sing produces and imports this voice signal, perhaps also can import the prior voice signal of recording or handling.

Above-mentioned processing unit 204 mainly is to handle according to this melody and tut signal, produces a synthetic singing voice signals.In certain embodiments; The processing of being carried out comprises that the tut signal is carried out pitch evens up to obtain a plurality of identical pitch signals; And according to this melody; With the adjustment of above-mentioned identical pitch signal to corresponding to the indicated a plurality of standard pitches of the melody of this song, to obtain voice signal after a plurality of adjustment.In the time of further, can be again voice signal after a plurality of adjustment of this adjustment be carried out smoothing processing, to produce voice signal after the smoothing processing.Below describe with some specific embodiment again.

In certain embodiments, processing unit 204 can be carried out a pitch routine analyzer, sees through pitch and follows the trail of (Pitch Tracking), pitchmark (Pitch Marking), evens up to obtain a plurality of identical pitch signals so that the tut signal is carried out pitch.Then; Processing unit 204 is carried out pitch adjustment program to a plurality of identical pitch signals; For example use the synchronous method of superposition of primitive period (Pitch SynchronousOverLap-Add; PSOLA), intersect repercussion (Cross-Fadding) or resample method (Resample), with a plurality of identical pitch signals respectively adjustment to corresponding to the indicated a plurality of standard pitches of the melody of this song, to obtain voice signal after a plurality of adjustment; This running details about the synchronous method of superposition of primitive period, intersection repercussion and resample method will further describe in Fig. 4, Fig. 5, Fig. 6 A and Fig. 6 B respectively after a while.Then; Processing unit 204 is carried out the smoothing processing program to voice signal after a plurality of adjustment again, for example use linear interpolation (interpolation), bilinear interpolation method or polynomial interpolation with above-mentioned adjustment after voice signal couple together to obtain voice signal after the smoothing processing; Wherein the running details about polynomial interpolation will further describe in Fig. 7 A～7C after a while.

In further embodiments; Processing unit 204 is further carried out song special effect processing program with voice signal after this smoothing processing; It can determine the size of sampling sound frame according to the system loading conditions of song synthesis system 200; Then voice signal after this smoothing processing is carried out the volume adjustment in regular turn, adds trill and adds echo effect with sampling sound frame size, produce voice signal after the special effect processing.In further embodiments; Processing unit 204 can be directed against above-mentioned multiple voice signal; Like voice signal after voice signal or the special effect processing after voice signal, the smoothing processing after a plurality of adjustment etc.; Carry out the accompaniment synthesis program, the accompaniment music and the above-mentioned various voice signal of this song is synthetic to obtain an accompaniment singing voice signals.Voice signal after voice signal, the special effect processing, accompaniment singing voice signals etc. after voice signal, the smoothing processing after the aforesaid adjustment; Be all the enforcement appearance attitude of synthetic singing voice signals of the present invention; One synthetic singing voice signals can be one include a plurality of voice signals (as after the above-mentioned adjustment, after the smoothing processing, after the special effect processing or the voice signal of accompaniment after handling) archives, and should promptly have this user's tone color by synthetic song.In certain embodiments, song synthesis system 200 can comprise an output unit again, in order to will synthesize singing voice signals output; And this output unit can further combine beat unit 202 or other display unit; When output should be synthesized singing voice signals, according to showing beat by synthetic singing voice signals, like above-mentioned actions such as waving, rotate, beat; Or move, visual symbols such as jump, flicker, variable color, or the voice signal of imitation metronome " answer, answer～" sound etc.

Fig. 3 is the synoptic diagram according to the described judgement rhythm of one embodiment of the invention error.As shown in Figure 3, the voice signal input of one section lyrics includes the lyrics 1～lyrics 3.In certain embodiments, except the melody that stores above-mentioned song, can further store in the storage element 201 the lyrics that should melody, and corresponding to the rhythm of the lyrics.Rhythm analytic unit (not illustrating) is obtained the standard beat r (i) of this section lyrics according to the melody of song; Wherein r (1), r (2) represent the time interval end points of the lyrics 1; R (3), r (4) represent the time interval end points of the lyrics 2; R (5), r (6) represent the time interval end points of the lyrics 3; Be positioned at the dotted line representative error allowed time of input in advance before the time interval end points, be positioned at the error allowed time that dotted line representative behind the time interval end points postpones input, so the formed interval of transversal and dotted line is error tolerances μ.And a plurality of voice signals that the user imported have a set rhythm, and this set rhythm is with c (i) expression, and so in this embodiment, cumulative errors are worth available computing formula (1) expression:

P (j) = Σ_{i = -}^{n} | r (i) - c (j) |, j = 1 ~ 3 - - - (1)

Wherein j represents each lyrics, and as the P as a result (j) that calculates during greater than μ, then can re-enter the voice signal of these lyrics.

Fig. 4 is the pitch adjustment synoptic diagram according to the synchronous method of superposition of the said use primitive period of one embodiment of the invention.As shown in Figure 4; What the transverse axis of the top was represented is the voice signal of accomplishing the pitch routine analyzer, and the arrow pointer is represented the mark pitch, in this embodiment; The target pitch of the adjustment of wanting is 2 times of original pitch, so the distance between the mark pitch is reduced to original 1/2; Otherwise,, then the distance between the mark pitch is amplified 2 times if the target pitch of the adjustment of wanting is 1/2 of an original pitch.Between per then two pitches, all come again plastotype (model), the wherein available computing formula of the calculating of Hamming window (2) expression with a Hamming window (Hamming window):

W (m) = 0.54 - 0.46 \times \cos (\frac{2 πm}{N - 1}), 0 \leq m \leq N - - - (2)

Wherein N represents the time width of sampling (sample), the time point of m representative in the time width of sampling.Again this waveform through the Hamming window addition is added up with overlap mode at last, form a new voice signal waveform.

Fig. 5 is the pitch adjustment synoptic diagram according to the said use intersection of one embodiment of the invention repercussion.The repercussion that intersects is the pitch adjusting process of the synchronous method of superposition of a kind of similar primitive period, and required computing time is less, but relatively, synthesizing of voice just do not have the level and smooth of the synchronous method of superposition of primitive period.Utilize the intersection repercussion can change the height of pitch easily; And replaced the way of Hamming window in the synchronous method of superposition of primitive period with the mode of quarter window (triangular window); Its flow process is identical with the synchronous method of superposition of primitive period; After obtaining correct pitch, do the inner product voice signal waveform that multiplies each other by these pitches and quarter window again.

Fig. 6 A, 6B are the pitch adjustment synoptic diagram according to the said use resample method of one embodiment of the invention.Resample method shown in Fig. 6 A is the indication according to melody; With the mode that reduces sampling (downsampling) with former voice signal be shifted (shift) be upgraded to original 2 overtones height; Otherwise, shown in Fig. 6 B, if will be with former voice signal displacement; Making its pitch reduce to original 1/2 times, then is to carry out with the mode that improves sampling (up sampling).

Because in the process that true man give song recitals, the conversion between the different pitches does not have way as computing machine, all directly arrives the target pitch accurately from a pitch at every turn; Especially when the change in pitch amplitude is bigger; Usually can surpass earlier the target pitch some, arrive the target pitch more smoothly, therefore in order to simulate the characteristic of this true man's song; So in one embodiment of this invention, adopted Bezier curve (B é zier curve) to carry out the running of smoothing processing program.With the cube Bezier curve is example, and four reference mark P0, P1, P2, P3 indicate shown in Fig. 7 A, and wherein the relation between the reference mark is represented with computing formula (4):

δ = 1 - \exp (\frac{- | P_{3} - P_{0} |}{100})

P_{y - 1} = P_{y} &PlusMinus; P_{y} (\sqrt[12]{2} - 1) \times δ,, 1 \leq y \leq 3 - - - (4)

Wherein, δ is a parameter; Increase along with the change in pitch amplitude; And its value is between 0 and 1, and is the ratio of twelve-tone equal temperament scale semitone.In addition, sign of operation " ± " expression in the computing formula (4) is upwards as if change in pitch, then be "+", otherwise, then be "-".Shown in Fig. 7 A; Its control point P0 is that initial pitch, reference mark P3 are the target pitch, gets reference mark P0 and turns right 2 milliseconds for reference mark P2, gets reference mark P2 and turns left 1 millisecond and be reference mark P1; Then, bring formula B (the t)=P of cube Bezier curve into computing formula (4) ₀(1-t) ³+ 3P ₁T (1-t) ²+ 3P ₂t ²(1-t)+P ₃t ³, t ∈ [0,1] calculates the curve that connects P0 and P3.

In another embodiment of the present invention, use the biquadratic Bezier curve to carry out the running of smoothing processing program.Relation between five reference mark P0, P1, P2, P3, the P4 is represented with computing formula (5):

δ = 1 - \exp (\frac{- | P_{4} - P_{0} |}{100})

P_{y - 1} = P_{y} &PlusMinus; P_{y} (\sqrt[12]{2} - 1) \times δ, 1 \leq y \leq 4 - - - (5)

Wherein, δ is a parameter; Increase along with the change in pitch amplitude; And its value is between 0 and 1, and

is the ratio of twelve-tone equal temperament scale semitone.In addition, sign of operation " ± " expression in the computing formula (5) is upwards as if change in pitch, then be "+", otherwise, then be "-".Shown in Fig. 7 B; Its control point P0 is initial pitch, gets reference mark P0 and turns right 60 milliseconds for reference mark P2, gets reference mark P2 and turns left 10 milliseconds and be reference mark P1; Get reference mark P2 and turn right 40 milliseconds for reference mark P4; Get reference mark P4 and turn left 20 milliseconds for reference mark P3, then, bring the formula of biquadratic Bezier curve into computing formula (5):

B (t)=P ₀(1-t) ⁴+ 4P ₁(1-t) ³T+6P ₂(1-t) ²t ²+ 4P ₃(1-t) t ³+ P ₄t ⁴, t ∈ [0,1] calculates the curve that connects P0 and P4.

In another embodiment of the present invention, use five power Bezier curves to carry out the running of smoothing processing program.Relation between six reference mark P0, P1, P2, P3, P4, the P5 is represented with computing formula (6):

δ = 1 - \exp (\frac{- | P_{5} - P_{0} |}{100})

P_{y - 1} = P_{y} &PlusMinus; P_{y} (\sqrt[12]{2} - 1) \times δ,, 1 \leq y \leq 5 - - - (6)

is the ratio of twelve-tone equal temperament scale semitone.In addition, sign of operation " ± " expression in the computing formula (6) is upwards as if change in pitch, then be "+", otherwise, then be "-".Shown in Fig. 7 C; Its control point P0 is that initial pitch, reference mark P5 are the target pitch, gets reference mark P0 and turns right 2 milliseconds for reference mark P2, gets reference mark P2 and turns left 1 millisecond and be reference mark P1; Get reference mark P2 and turn right 2 milliseconds for reference mark P4; Get reference mark P4 and turn left 1 millisecond for reference mark P3, then, bring the formula of five power Bezier curves into computing formula (6):

B (t)=P ₀(1-t) ⁴+ 4P ₁(1-t) ³T+6P ₂(1-t) ²t ²+ 4P ₃(1-t) t ³+ P ₄t ⁴, t ∈ [0,1] calculates the curve that connects P0 and P5.

Fig. 8 is the process flow diagram according to the described song synthetic method of one embodiment of the invention.This song synthetic method is applicable to a computing electronics; At first; Obtain the beat of this song according to the melody of a selected song, point out this beat (step S801) then, point out the main effect of this beat; Be to let a user chant or to hum the lyrics of this song with the mode of spoken language according to the beat prompting; A radio reception module that sees through this computing electronics then receives a plurality of voice signals (step S802), and the tut signal can be the lyrics information generation of this user according to this song, and preferably the tut signal is to produce according to this beat.This song synthetic method is handled to this melody and tut signal again, and sees through a broadcast module output one synthetic singing voice signals (step S803) of above-mentioned computing electronics.

This computing electronics can comprise a display unit, produces visual signal as above-mentioned beat, for example moves, the symbol of jump, flicker or variable color; Or this computing electronics can comprise an output unit, produces voice signal as above-mentioned beat, for example imitates metronomic " answer, answer～" sound; Or this computing electronics can comprise a physical construction, provides the beat action as above-mentioned beat, for example waves, rotates, beats or metronomic balancing point structure; Or this computing electronics also can comprise a luminescence unit, and the flicker, variable color etc. that produce light are as above-mentioned beat.And in order to let the rhythm of a plurality of voice signals that the user imported have correctness to a certain degree; Above-mentioned song synthetic method can be after receiving a plurality of voice signals that the user imports; Further judge according to the melody of this song whether the set rhythm that this voice signal has surpasses a preset allowable error value; If then point out the user to repeat the step of above-mentioned input audio signal; This can adopt mode as shown in Figure 3 about the running of judging the rhythm error.Perhaps; Above-mentioned song synthetic method also can be designed to after receiving a plurality of voice signals that the user imports; Further decide this voice signal output in its sole discretion whether to accept this recorded version by the user,, then repeat the step of above-mentioned input audio signal if do not accept.In addition, in other embodiments, the mode that the user also can sing produces and imports this voice signal, perhaps also can import the prior voice signal of recording or handling.

Shown in Fig. 9 A; Above-mentioned song synthetic method can further be subdivided into following steps again to the processing that this voice signal carried out: at first; Carry out pitch routine analyzer (step S803-1) to this voice signal; See through pitch tracking, pitchmark, even up to obtain a plurality of identical pitch signals so that the tut signal is carried out pitch.Then; Carry out pitch adjustment program (step S803-2) to a plurality of identical pitches; For example use the synchronous method of superposition of primitive period, intersection repercussion or resample method; With a plurality of identical pitch signals respectively adjustment to corresponding to the indicated a plurality of standard pitches of the melody of this song, to obtain voice signal after a plurality of adjustment; This running about the synchronous method of superposition of primitive period, intersection repercussion and resample method can be adopted like above-mentioned mode about Fig. 4, Fig. 5, Fig. 6 A and Fig. 6 B.

Shown in Fig. 9 B; In certain embodiments; Above-mentioned song synthetic method is after pitch routine analyzer and pitch adjustment program; Can continue again to carry out smoothing processing program (step S803-3), for example use linear interpolation, bilinear interpolation method or polynomial interpolation, voice signal after the above-mentioned adjustment is coupled together to obtain voice signal after the smoothing processing to voice signal after a plurality of adjustment; Wherein can adopt like above-mentioned mode about Fig. 7 A～7C about the running of polynomial interpolation.

Shown in Fig. 9 C; In certain embodiments; Above-mentioned song synthetic method is after pitch routine analyzer, pitch adjustment program and smoothing processing program; Can further carry out song special effect processing program (step S803-4) again to voice signal after this smoothing processing; It can carry out voice signal after this smoothing processing the volume adjustment, add trill and add echo effect with sampling sound frame size according to the size of the system loading conditions of this computing electronics decision sampling sound frame then in regular turn, produces voice signal after the special effect processing.

Shown in Fig. 9 D; In certain embodiments; Above-mentioned song synthetic method can be with above-mentioned multiple voice signal, like voice signal after voice signal or the special effect processing after voice signal, the smoothing processing after a plurality of adjustment etc., carries out accompaniment synthesis program (step S803-5); The accompaniment music and the emulation singing voice signals of this song is synthetic with after obtaining an accompaniment singing voice signals, the singing voice signals of should accompanying again output.Voice signal after voice signal, the special effect processing, accompaniment singing voice signals etc. after voice signal, the smoothing processing after aforesaid a plurality of adjustment are all the enforcement appearance attitude of synthetic singing voice signals of the present invention, and should promptly have this user's tone color by synthetic song.

The computing electronics of implementing this song synthetic method can be desktop PC, mobile computer, hand-hold communication device, the public son of electronics, electronic pet etc.In addition; This computing electronics can comprise a song database, desires to carry out the synthetic song of song in order to store the melody of many first (liking like the user) songs, the user can therefrom be selected; And this song database also can store the pairing lyrics of song, and corresponding to the rhythm of the lyrics.

Figure 10 is the Organization Chart according to the described song synthesizer of one embodiment of the invention.Shown in figure 10; It is public young that song synthesizer 1000 can be an electronics; In other embodiments, song synthesizer 1000 also can be desktop PC, mobile computer, hand-hold communication device, palmtop device, personal digital aid (PDA), electronic pet device, robot, radio-tape recorder or music CD player etc.Song synthesizer 1000 comprises a housing 1010, a reservoir 1020, a beat mechanism 1030, a radio reception device 1040, a processor 1050 at least.Reservoir 1020 is arranged at housing 1010 inside, is connected to processor 1050, stores the melody of how first song, and the melody that this song can be provided is to beat mechanism 1030.Beat mechanism 1030 is arranged at housing 1010 outsides, is connected to processor 1050, can be according to the corresponding beat of a specific melody prompting in the above-mentioned melody, and auxiliary user is according to the lyrics of chanting or hum this song with the mode of spoken language.Radio reception device 1040 is arranged at housing 1010 outsides, receives above-mentioned user and chants or hum a plurality of voice signals that produced.And processor 1050 is arranged at housing 1010 inside, handles according to above-mentioned specific melody and tut signal, produces a synthetic singing voice signals.

Like the embodiment of Figure 10, reservoir 1020 can be arranged at the public young metastomium of electronics, is an internal memory, like Flash, Hard disk, Cache etc.Above-mentioned melody can be a wave file or a musical instrument digital interface file; And can there be numerous embodiments in beat mechanism 1030; It for example is an illuminator; The public young ocular of electronics that is arranged at shown in figure 10 can produce the flicker, variable color of light etc., and the object that can use light emitting diode or other to have luminosity on real the work is accomplished; Or another kind of beat mechanism 1030 can be arranged at the public young hand region of electronics, is a movable, mechanical structures, provide and wave, rotate, beat, or as the swing of metronomic balancing point, can use the metronomic balancing point object of similar piano to accomplish on real the work; Or another kind of beat mechanism 1030 can be a display, is arranged at the public young abdomen area of electronics, for example produce move, the visual signal of the symbol of jump, flicker or variable color or the like; Also or another beat mechanism 1030 can be a broadcaster and be arranged at the public young mouth region of electronics, metronomic " answer, answer～" sound is for example imitated in output.Radio reception device 1040 can be arranged at the public young ear zone of electronics, for example is that a microphone, a sound collector, a phonographic recorder or other have the object of function of radio receiver, wherein the corresponding above-mentioned specific melody of tut signal and meet this beat.

Processor 1050 can be arranged at the public young enclosure interior of electronics, required other object when being an embedded micro processor and running thereof.Processor 1050 its connection reservoir 1020, beat mechanism 1030 and radio reception devices 1040 mainly are to handle according to above-mentioned specific melody and tut signal, produce a synthetic singing voice signals.In certain embodiments; The processing of being carried out comprises that the tut signal is carried out pitch evens up to obtain a plurality of identical pitch signals; And according to above-mentioned specific melody; With the adjustment of above-mentioned identical pitch signal to corresponding to the indicated a plurality of standard pitches of above-mentioned specific melody, to obtain voice signal after a plurality of adjustment.In the time of further, processor 1050 can be again carried out smoothing processing with voice signal after a plurality of adjustment of this adjustment, to produce voice signal after the smoothing processing.

In further embodiments, processor 1050 can be carried out a pitch analyzing and processing, sees through pitch tracking, pitchmark, carries out pitch again and evens up to obtain a plurality of identical pitches.Then; Processor 1050 is carried out a pitch alignment treatment to a plurality of identical pitches; Utilization primitive period synchronous method of superposition, intersect repercussion or resample method will a plurality of identical pitches respectively adjustment extremely corresponding to the indicated a plurality of standard pitches of above-mentioned specific melody, to obtain voice signal after a plurality of adjustment; This running details about the synchronous method of superposition of primitive period, intersection repercussion and resample method can be with reference to above-mentioned narration about Fig. 4, Fig. 5, Fig. 6 A and Fig. 6 B.Then, processor 1050 is carried out a smoothing processing to voice signal after a plurality of adjustment again, utilization linear interpolation, bilinear interpolation method or polynomial interpolation with above-mentioned adjustment after voice signal couple together to obtain voice signal after the smoothing processing; Wherein can be with reference to above-mentioned narration about Fig. 7 A～7C about the running details of polynomial interpolation.

In further embodiments; Processor 1050 can be further to voice signal after this smoothing processing; Carry out a song special effect processing; According to the size of the system loading conditions of song synthesizer 1000 decision sampling sound frame, then the emulation singing voice signals is carried out the volume adjustment in regular turn, adds trill and adds echo effect with sampling sound frame size.In further embodiments; Processor 1050 can be directed against above-mentioned multiple voice signal; Like voice signal after voice signal or the special effect processing after voice signal, the smoothing processing after a plurality of adjustment etc.; Carry out the synthetic processing of an accompaniment, the accompaniment music and the above-mentioned various voice signal of this song is synthetic to obtain an accompaniment singing voice signals.Voice signal after voice signal, the special effect processing, accompaniment singing voice signals etc. after voice signal, the smoothing processing after aforesaid a plurality of adjustment are all the enforcement appearance attitude of synthetic singing voice signals of the present invention, and should promptly have this user's tone color by synthetic song.

In certain embodiments, song synthesizer 1000 can comprise a broadcaster (not illustrating) again, is arranged at housing 1010 outsides, is connected in processor 1050, will synthesize singing voice signals output.Like the embodiment of Figure 10, broadcaster can be arranged at the public young mouth region of electronics, is that loudspeaker, a loudspeaker, an earphone, an Audio Players or other have equipment, the object of broadcast function.In the time of further; Beat mechanism 1030 can be when broadcaster output should be synthesized singing voice signals; Cooperate the beat that shows this synthetic singing voice signals; Like above-mentioned actions such as waving, rotate, beat, or move, visual symbols such as jump, flicker, variable color, or the voice signal of imitation metronome " answer, answer～" sound.

In order to let the rhythm of a plurality of voice signals that the user imported have correctness to a certain degree; Processor 1050 can carry out a rhythm analyzing and processing again; After receiving a plurality of voice signals that the user imports, judge according to the melody of this song whether the set rhythm that this voice signal has surpasses a preset allowable error value.If above-mentioned set rhythm surpasses preset allowable error value, then point out the user to re-enter voice signal, details can be with reference to above-mentioned narration about Fig. 3.Another kind of embodiment; Also can be by processor 1050 and radio reception device 1040, after receiving a plurality of voice signals that the user imports, this voice signal is exported via broadcaster; Let the user decide in its sole discretion and whether accept, or re-enter a plurality of voice signals to replace old voice signal.In addition, in other embodiments, the mode that the user also can sing produces and imports the tut signal, perhaps also can import the prior voice signal of recording or handling.

Like the above embodiments; The voice signal person of being to use of the present invention according to this melody, beat chant or hum produce; Therefore each voice signal corresponds to this melody and beat thereof respectively; Can directly this voice signal be handled, save the time and the cost of a large amount of user's corpus that need prerecord in a large number in the prior art, reach the effect of saving system resource and quickening the song aggregate velocity; And the final synthetic song that obtains has more user's tone color, and effect is quite intended true.

Though the present invention discloses as above with various embodiment, however its be merely example with reference to but not in order to limiting scope of the present invention, anyly know this art, do not breaking away from the spirit and scope of the present invention, when doing a little change and retouching.Therefore the foregoing description is not in order to limiting scope of the present invention, and protection scope of the present invention is as the criterion when looking the claim scope person of defining.

Claims

1. a song synthesis system is characterized in that, said song synthesis system comprises:

One storage element is in order to store at least one melody;

One beat unit is in order to point out a beat according to a specific melody in said at least one melody;

One input block; In order to receive a plurality of voice signals; The corresponding said specific melody of wherein said voice signal, and said voice signal is to be produced according to a lyrics information and said beat by a user, and said voice signal corresponds to each lyrics in the said lyrics information in regular turn respectively; And

One processing unit is in order to handle said voice signal according to said specific melody and to produce a synthetic singing voice signals.

2. song synthesis system as claimed in claim 1 is characterized in that, said voice signal has a set rhythm, and said song synthesis system further comprises a rhythm analytic unit, whether surpasses a preset allowable error value in order to judge said set rhythm.

3. song synthesis system as claimed in claim 1 is characterized in that, said processing unit comprises to the processing that said voice signal carried out:

Carry out a pitch routine analyzer and a pitch adjustment program obtaining voice signal after a plurality of adjustment, and be said synthetic singing voice signals with voice signal after the said adjustment,

Wherein said pitch routine analyzer sees through the pitch tracking and obtains a plurality of pitches that correspond to said voice signal respectively, more said pitch is evened up to obtain a plurality of identical pitches.

4. song synthesis system as claimed in claim 3 is characterized in that, said processing unit further comprises to the processing that said voice signal carried out:

Carry out a smoothing processing program obtaining voice signal after the smoothing processing to voice signal after the said adjustment, and be said synthetic singing voice signals with voice signal after the said smoothing processing.

5. song synthesis system as claimed in claim 4 is characterized in that, said processing unit further comprises to the processing that said voice signal carried out:

Carry out a song special effect processing program obtaining voice signal after the special effect processing to voice signal after the said smoothing processing, and be said synthetic singing voice signals with voice signal after the said special effect processing.

6. song synthesis system as claimed in claim 5 is characterized in that, said processing unit further comprises to the processing that said voice signal carried out:

To in the voice signal after voice signal and the said special effect processing after voice signal, the said smoothing processing after the said adjustment one of them; Carry out an accompaniment synthesis program obtaining an accompaniment singing voice signals, and be said synthetic singing voice signals with said accompaniment singing voice signals.

7. a song synthetic method is applicable to a computing electronics, it is characterized in that, said song synthetic method comprises:

Point out a beat according to a specific melody at least one melody;

A radio reception module that sees through said computing electronics receives a plurality of voice signals; The corresponding said specific melody of wherein said voice signal; And said voice signal is produced according to a lyrics information and said beat by a user, and said voice signal has a set rhythm and corresponds to each lyrics in the said lyrics information in regular turn respectively; And

Handle said voice signal and export a synthetic singing voice signals according to said specific melody through a broadcast module of said computing electronics.

8. song synthetic method as claimed in claim 7 is characterized in that, also comprises: said song synthetic method judges whether said set rhythm surpasses a preset allowable error value, if then repeat to produce said voice signal.

9. song synthetic method as claimed in claim 7 is characterized in that, further comprises to the processing that said voice signal carried out:

10. song synthetic method as claimed in claim 9 is characterized in that, further comprises to the processing that said voice signal carried out:

11. song synthetic method as claimed in claim 10 is characterized in that, further comprises to the processing that said voice signal carried out:

12. song synthetic method as claimed in claim 11 is characterized in that, further comprises to the processing that said voice signal carried out:

13. a song synthesizer is characterized in that, said song synthesizer comprises a housing, a reservoir, a beat mechanism, a radio reception device, a processor at least, wherein:

Said reservoir is arranged at said enclosure interior, is connected to said processor, stores at least one melody;

Said beat mechanism is arranged at said outside, is connected to said processor, points out a beat according to a specific melody of said melody;

Said radio reception device is arranged at said outside; Be connected to said processor; Receive a plurality of voice signals; And the corresponding said specific melody of said voice signal, and said voice signal is to be produced according to a lyrics information and said beat by a user, and said voice signal has a set rhythm and also corresponds to each lyrics in the said lyrics information in regular turn respectively; And

Said processor is arranged at said enclosure interior, according to said specific melody a synthetic singing voice signals is handled and produced to said voice signal.

14. song synthesizer as claimed in claim 13 is characterized in that, said reservoir is an internal memory; Said beat mechanism is an illuminator, a movable, mechanical structures, a display or a broadcaster; Said radio reception device is a microphone, a sound collector or a phonographic recorder; And said processor is an embedded micro processor.

15. song synthesizer as claimed in claim 13 is characterized in that, also comprises: said processor judges further whether said set rhythm surpasses a preset allowable error value, if then point out said user to repeat to produce said voice signal.

16. song synthesizer as claimed in claim 13; It is characterized in that; Said processor is treated to said voice signal carried out carries out a pitch analyzing and processing and a pitch alignment treatment to obtain voice signal after more than one adjustment; And be said synthetic singing voice signals with voice signal after the said adjustment, said pitch analyzing and processing sees through the pitch tracking and obtains a plurality of pitches that correspond to said voice signal respectively, more said pitch is evened up to obtain a plurality of identical pitches.

17. song synthesizer as claimed in claim 16; It is characterized in that; Said processor is to processing that said voice signal carried out; Further comprise voice signal after the said adjustment is carried out a smoothing processing obtaining voice signal after the smoothing processing, and be said synthetic singing voice signals with voice signal after the said smoothing processing.

18. song synthesizer as claimed in claim 17; It is characterized in that; Said processor is to processing that said voice signal carried out; Further comprise voice signal after the said smoothing processing is carried out a song special effect processing obtaining voice signal after the special effect processing, and be said synthetic singing voice signals with voice signal after the said special effect processing.

19. song synthesizer as claimed in claim 18; It is characterized in that; Said processor is to processing that said voice signal carried out; Further comprise in the voice signal after voice signal and the said special effect processing after voice signal, the said smoothing processing after the said adjustment one of them, carry out that an accompaniment is synthetic to be handled obtaining an accompaniment singing voice signals, and be said synthetic singing voice signals with said accompaniment singing voice signals.

20. song synthesizer as claimed in claim 13 is characterized in that, said song synthesizer further comprises:

One broadcaster is exported said synthetic singing voice signals.