CN100342426C

CN100342426C - Singing generator and portable communication terminal having singing generation function

Info

Publication number: CN100342426C
Application number: CNB2005100055433A
Authority: CN
Inventors: 山木清志
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2004-01-23
Filing date: 2005-01-20
Publication date: 2007-10-10
Anticipated expiration: 2025-01-20
Also published as: HK1077390A1; JP4277697B2; CN1661674A; JP2005208394A

Abstract

To provide a singing voice generating unit in which a data input for generating a singing voice is performed easily and also which generates the singing voice by voice synthesis according to the data even in compact equipment such as a portable communication terminal, and its program and the portable communication terminal having a singing voice generating function. The singing voice generating unit is equipped with; an input means (3b) with which musical score data representing a melody for making a voice singing voice is inputted; a storage means (4b) in which each pronunciation data which describes and specifies a voice having various tone pitches and tone lengths to one certain utterance character at least in a text are stored; a control means (1b) which extracts pronunciation data corresponding to the tone pitch and the tone length of each note in the musical score data sequentially from the storage means (4b) and which generates a pronunciation data string by putting the extracted pronunciation data in order; and a pronunciation means which pronounces a voice according to the pronunciation data string. (C)2005,JPO&NCIPI.

Description

Song generating apparatus and have the portable mobile terminal of song systematic function

Technical field

The present invention relates to by the song generating apparatus of phonetic synthesis, the program that the song systematic function is specific and portable mobile terminal with song systematic function.

Background technology

In recent years, the exploitation, commercialization various pocket telephones (cellular phones, PHS (personalhandyphone system: the login trade mark)) and portable mobile terminal (PDA (personal digital assistant) etc.), for example, also commercialization the user login the tune of oneself making, have this tune as ring back tone and the pocket telephone of representational role.

In addition, Japanese Patent Application Publication communique spy opens among the flat 11-184490, a kind of song synthetic method below disclosing, promptly in the phonetic synthesis according to original regular phoneme synthesizing method, can only be with by the word formed of text of regulation and article soundization and reading aloud simply, and can not add desirable tune in the above and generate so-called " song (singing voice) ", in view of such problem points, the song synthetic method of the synthetic song of the lyrics according to the rules and note information.Specifically, read in file that meets music score and MIDI (Musical Instrument Digital Interface) specification etc., from the note information that wherein comprises, extract basic frequency and time span out, further, distribute hiragana (to form with respect to note to the lyrics by 50 sound literal of Japanese etc., each literal pronounces according to the combination of vowel and consonant), it is decomposed into harmonious sounds series, with the basic frequency of aforementioned extraction and time length as prosodic information, harmonious sounds series as the text input information, is carried out phonetic synthesis (speech synthesis) by regular phonetic synthesis mode.

But, in the above-mentioned pocket telephone available user to make tune by oneself be only login and reproduce tune, be not to have the function of reproducing song.

In addition, though the disclosed technology of above-mentioned communique has been mentioned the representational role of song, record does not append to the lyrics respectively with hiragana the concrete grammar of mode on the note etc.Under the situation that speech waveform is used as the unit of phonetic synthesis, the data quantitative change is big, is difficult to realize in the small electronic apparatus of little pocket telephone of memory capacity etc.

Summary of the invention

The present invention puts in view of the above-mentioned problems and forms, it relates to: even in the small electronic apparatus of portable mobile terminal etc., also can easily realize being used for the synthetic data input of song, and can according to these data by phonetic synthesis generate the song generating apparatus of song, the program that the song systematic function is specialized and portable mobile terminal with song systematic function.

Song generating apparatus of the present invention, by constituting as the lower part: input media, its input are illustrated in the music data of the tune that uses in the song reproduction; Memory storage, a plurality of pronunciation datas of the character string that comprises pronunciation literal, first symbol, second symbol are recorded and narrated in its storage with textual form (HV-Script form), wherein, the literal that the pronunciation textual representation should be pronounced as song, first symbol is indicated during the pronunciation of aforementioned pronunciation literal, the tone when aforementioned pronunciation literal is pronounced in the indication of second symbol or the variation of volume; Control device, it sets up corresponding relation by the desirable pronunciation data that will select and each note that constitutes aforementioned music data from aforementioned a plurality of pronunciation datas that aforementioned memory storage is stored, thereby generates the pronunciation data row; The song transcriber, it is listed as according to aforementioned pronunciation data, with the represented aforementioned melody of aforementioned music data, reproduce with aforementioned each note and set up the represented song of pronunciation literal in each pronunciation data of corresponding relation, in aforementioned song transcriber, in during the pronunciation indicated by aforementioned first symbol in the aforementioned pronunciation data, making the mode of tone or volume change based on aforementioned second symbol in the aforementioned pronunciation data, reproduce with aforementioned each note and set up the represented song of pronunciation literal in each pronunciation data of corresponding relation.

Above-mentioned pronunciation data is by the pronunciation literal and stipulate that the prosodic sign of its pronunciation characteristics forms, input media can be imported desirable literal, in addition, control device is replaced with each note with the desirable literal (for example constituting the literal of the lyrics) of input and has been set up the pronunciation literal that contains in each pronunciation data of corresponding relation.

Above-mentioned song transcriber is according to the pronunciation literal and the prosodic sign that comprise in the pronunciation data row, reproduces pronunciation characteristics with regulation and by the represented song of pronunciation literal that comprises in the pronunciation data row.

Above-mentioned pronunciation data be at least with the data that velocity of sound is corresponding, in addition, control device is, under the different situation of the pairing velocity of sound of in the velocity of sound of music data and memory storage, storing of pronunciation data, corresponding to the velocity of sound of music data and the ratio of the pairing velocity of sound of pronunciation data, change the duration of a sound of the prosodic sign of pronunciation data, in view of the above, consistently generate the pronunciation data row with velocity of sound with the music data of importing.

Also can carry out following pronunciation control, promptly in the time will pronouncing, in the zero hour of this pronunciation, to pronounce than beginning by the low a little pitch of the pitch of music data appointment by the represented pronunciation literal of above-mentioned pronunciation data, after this, return to the pitch of appointment.

And then in the time will being pronounced by the represented pronunciation literal of pronunciation data, also the tremolo effect that can in its pronunciation way pitch be changed up and down is attached in the pronunciation data.

Also can generate the functional programs of specializing above-mentioned song generating apparatus, and be encased in the computer system etc., the structure that perhaps will specialize the function of song generating apparatus is encased in the portable mobile terminal.

Description of drawings

Fig. 1 is the block diagram of the formation of the expression song transcriber relevant with preferred implementation of the present invention.

Fig. 2 is the block diagram that the inside of the HV source of sound shown in the presentation graphs 1 constitutes.

Fig. 3 is the block diagram that the inside of the resonance peak generation unit shown in the presentation graphs 2 constitutes.

Fig. 4 A represents the representation example of prosodic sign.

Fig. 4 B represents the tone control example by the beginning of the sentence of prosodic sign.

Fig. 4 C represents by the tone control example in the pronunciation way of prosodic sign.

Fig. 5 is used for the explanation of the table description method of HV-Script is used in to(for) the song of pronunciation literal " ら ".

The part of employed music score when Fig. 6 represents to utilize the pronunciation of HV song data.

Fig. 7 is the block diagram of the formation of expression song data generating device.

Fig. 8 is expression is generated processing by the HV song data of song generating apparatus a process flow diagram.

Fig. 9 is that then Fig. 8 represents to generate the process flow diagram of handling by the HV song data of song generating apparatus.

Figure 10 is the process flow diagram of expression HV song data interpretation and reproduction processes.

Figure 11 is the block diagram of the formation of the expression pocket telephone that the function of song generating apparatus is specific.

Embodiment

Together the present invention will be described in detail with reference to accompanying drawing and with embodiment.

Among Fig. 1 to Fig. 6 and Fig. 7, represent that the function of song transcriber relevant with the preferred embodiments of the present invention and song data generating device constitutes.In addition, the song generating apparatus is made of song transcriber shown in Figure 1 and song data generating device shown in Figure 7.

" HV song data (that is: pronunciation data row) " described in the present embodiment are HV-Script (the Human Voice Script) data of recording and narrating with the form of text, particularly by song reproduce with in the song recorded and narrated constitute with HV-Script data (perhaps pronunciation data), the text is included as the symbol of the regulation of reproducing desirable sound and using.

Here, HV-Script is made of the character string of the text that becomes the phonetic synthesis object that comprises prosodic sign (being used to specify the symbol of the pronunciation form of stress, pitch (scale, interval), pronunciation long (duration of a sound, tone period) etc.), but in the present embodiment particularly in order to carry out the generation of song, HV-Script is by a pronunciation literal and stipulate that the prosodic sign of its pitch duration of a sound etc. constitutes.In addition, its detailed condition is recorded and narrated in the back.

Among Fig. 1, symbol 1a represents that HV song reproduces player, and the reproduction of control HV song data and its stop etc.When receiving the reproduction indication of HV song data, the song that HV song reproduction player 1a begins to comprise in these HV song data explanation of HV-Script.Corresponding to the record content of this song with HV-Script, HV song reproduces player 1a control HV driver 2a and carries out following processing.

That is, HV driver 2a is with reference to the synthetic dictionary of storing among the synthetic dictionary memory 3a, and the processing below carrying out.

People's sound has the resonance peak (formant of shapes such as existing with ... vocal cords and oral cavity; Intrinsic frequency spectrum), store the parameter relevant in the synthetic dictionary with this resonance peak.Promptly, synthetic dictionary is following database, be about to for the parameter that the sound of reality will be sampled with its pronunciation literal unit (for example be literal units such as " あ ", " い " under the situation of Japanese) and the result that analyzes obtains the database of storing in advance with each literal unit of pronouncing as resonance peak frame data (formant frame data).This database also stores corresponding to prosodic sign described later and changes the data that the parameter relevant with above-mentioned resonance peak used.

HV driver 2a, the pronunciation character string that comprises prosodic sign among the HV-Script is made an explanation, use above-mentioned synthetic dictionary for the resonance peak frame data relevant with Received Pronunciation, the change that adds additional stress, scale, the duration of a sound etc. by the prosodic sign appointment, and be transformed to resonance peak frame row, export to HV source of sound 4a.HV source of sound 4a generates the pronunciation signal according to the resonance peak frame row from HV driver 2a output, outputs to loudspeaker 5a.Loudspeaker 5a pronounces according to the song of this pronunciation signal to regulation.

As mentioned above, the song transcriber is made of HV song player 1a, HV driver 2a, synthetic dictionary memory 3a, HV source of sound 4a and loudspeaker 5a.

In addition, HV song player 1a and HV driver 2a comprise the control device that is made of storer and CPU (central processing unit) etc., realize that by specializing these functional programs deposit storer in and carry out the device of realizing this function.

At this, the details of HV source of sound 4a is described with reference to Fig. 2 and Fig. 3.

HV source of sound 4a is according to CSM (composite sinusoidal model; The compounded sine wave pattern) device that moves of phoneme synthesizing method, under the situation of present embodiment, a phoneme (phoneme) is made of 8 kinds resonance peak, and above-mentioned synthetic dictionary stores 8 groups formant frequency, resonance peak energy level, reaches tone information as parameter.

Promptly, HV source of sound 4a has 8 resonance peak generation unit 40a～40h and a tone generation unit 50 as shown in Figure 2, based on parameter and the tone information of using the relevant resonance peak of sequencer (not shown) output from pronunciation, in resonance peak generation unit 40a～40h, generate corresponding resonance peak signal, synthetic and generation phoneme in audio mixing unit 60.Generate by carrying out such phoneme continuously, synthesize desirable sound.In addition, in each resonance peak generation unit 40a～40h, generation becomes the basic waveform that generates the basis that the resonance peak signal uses, but in the generation of this basic waveform, for example can use the waveform generator by known FM (frequency modulation) source of sound.Tone generation unit 50 has and generates tone by computing (pitch: function tone pitch) is under the situation of voiced sound (voicedsound) at the phoneme that should pronounce only, and the tone of computing is attached on the phoneme of generation.

Below, illustrate that with reference to Fig. 3 the inside of resonance peak generation unit 40a～40h constitutes.

Resonance peak generation unit 40a～40h has same formation separately, as shown in Figure 3, is made of waveform generator 41, noisemaker 42, totalizer 43 and amplifier 44.

Waveform generator 41 produces 1 resonance peak that constitutes 1 phoneme according to the formant frequency of the appointment by each resonance peak of each phoneme, the basic waveform (sine wave, triangular wave etc.) of resonance peak and the phase place of this waveform.Noisemaker 42 be corresponding to the resonance peak that produces by waveform generator 41 for voiced sound or voiceless sound (unvoiced sound) moves, under the situation of voiceless sound, generate noise and also supply to totalizer 43.

Resonance peak that 43 pairs of totalizers are produced by waveform generator 41 and the noise of being supplied with by noisemaker 42 carry out additive operation.The additive operation result of this totalizer 43 is amplified to the resonance peak energy level of regulation by amplifier 44.

As mentioned above, the relevant ground with a resonance peak that constitutes each phoneme of each resonance peak generation unit 40a～40h and constituting, phoneme are synthesized by a plurality of resonance peaks (present embodiment is 8 kinds of resonance peaks) and form.So in order to generate 1 phoneme, needs generate a plurality of resonance peaks of this phoneme of formation and it are synthesized.Therefore, by so as shown in Figure 2 formation, use the phonetic synthesis of formant parameter.

As mentioned above, in the CSM phoneme synthesizing method, synthesize, form phoneme thus, in view of the above, carry out phonetic synthesis based on a plurality of resonance peaks of generations such as frequency parameter or amplitude parameter and to it.Such as, when the such Japanese vocabulary of phonetic synthesis " さくら ", by setting the parameters of many groups to tens ms at every several ms, synthetic/S/ →/A/ →/K/ →/U/ →/R/ →/6 phonemes (consonant, vowel) of A/ and pronouncing.

Offer the parameter of resonance peak generation unit 40a～40h, pre-defined by each phoneme, sign in in the synthetic dictionary.In addition, about the information relevant with the phoneme that constitutes each literal (for example under the situation of " さ " of Japanese, its be by 2 phoneme/S/ and/ information that A/ constitutes) also sign in in the synthetic dictionary.And, under situation,, after the additional change, supply to HV source of sound 4a corresponding to prosodic sign for the pairing resonance peak frame data of the phoneme that this prosodic sign was suitable for by prosodic sign change stress.

Next, HV-Script and the song details with HV-Script is described.

The prosodic sign that contains among the HV-Script is for the symbol of appointment of pronunciation literal and additional desirable stress etc., can also be used for general phonetic synthesis except song generates purposes.Below, an example according to the HV-Script of Japanese that is utilized in the general phonetic synthesis is shown.

(example) " か _ さ Ga ほ ^5 _ い 4 い ' 4 ね $2-"

The example of this HV-Script is included in additional desirable tone in the language of " かさ Ga ほいね one " and goes forward side by side that the prosodic sign of the synthetic usefulness of lang sound records and narrates.The symbol of recording and narrating in this example " ' ", " ^ ", " _ ", " $ " etc. be prosodic sign, expression is attached to the kind of the tone on the Japanese character (being kana text or macron "-"), for the symbol that adds the stress of regulation immediately following the literal after the prosodic sign (having under the situation of numerical value immediately following recording and narrating after the prosodic sign, is the literal of following this numerical value).

Fig. 4 A illustrates the meaning of the prosodic sign of representative.Here, prosodic sign " ' " be illustrated in the frequency control (with reference to (1) of Fig. 4 B) that the prefix tone raises up, the control (with reference to (3) of Fig. 4 C) that prosodic sign " ^ " expression pronunciation medium pitch raises up, prosodic sign " _ " is illustrated in the control (with reference to (2) of Fig. 4 B) that the prefix tone descends, the control (with reference to (4) of Fig. 4 C) that prosodic sign " $ " expression pronunciation medium pitch descends is carried out phonetic synthesis according to these various tone controls.In addition, immediately following additional after the prosodic sign variable quantity of the stress that this numeric representation is additional is being arranged under the situation of numerical value.For example, in " か _ 3 さ Ga ", the prefix tone that is illustrated in " さ " amount shown in the numerical value 3 that only descends, ensuing " Ga " tone after with this decline pronounces.In addition, " か " of initial pronunciation is with the tone pronunciation of standard.

In this wise, when adding stress or tone in the literal that is included in the language of pronunciation, the prosodic sign shown in the appended drawings 4A before this character (further, the numerical value of expression tone variable quantity) is is also recorded and narrated HV-Script.In addition, in the present embodiment, only example shows the prosodic sign relevant with tone control, in addition, can also use the prosodic sign of the power of controlling sound, speed, tonequality etc.

In the present embodiment, song is signed in to HV song database described later with HV-Script, this song HV-Script, be to pitch or the pronunciation characteristics of the duration of a sound etc. the information used of pronunciation literal when pronouncing as control, the HV-Script data of using the prosodic sign shown in above-mentioned to record and narrate with textual form are used in song generates especially.This HV song database can be by each velocity of sound, for the song HV-Script of pronunciation literal login by each pitch or the duration of a sound.

At this, the explanation of giving one example is used to control the song of pronunciation characteristics of pronunciation literal with HV-Script (that is pronunciation data).

Available HV-Script and song are with the difference between the HV-Script in the general phonetic synthesis, and 1 song only is defined in 1 literal with the pronunciation literal that is comprised among the HV-Script.

For example, illustrate the pronunciation literal " ら " of velocity of sound 120 according to the duration of a sound of crotchet, distributed pitch C2 song with HV-Script (with reference to Fig. 5).And pitch C2 represents the note " De " in the octave of benchmark, and in addition, pitch C1 represents that pitch C3 represents than the note of high 1 octave of benchmark " De " than the note " De " of low 1 octave of benchmark.Here, the control character " L1W2S54 " that then comprises in the prosodic sign and record and narrate " C2$ ら ^4〉2---〉---〉﹠amp; ".

In addition, control literal " S** " (setting the numerical value of regulation in " * * ") is the literal of the UL of regulation 1 pronunciation literal or long, such as, the time span of " S54 " expression 80ms.Therefore, add up under the situation of using 6 "-" in pronunciation literal " ら ", whole time spans is 80ms * 6=480ms, is set the note length into the crotchet of velocity of sound 120.In addition, the note length of the crotchet of velocity of sound 120 correctly is 500ms, but here is 480ms.

Control character " L* " (wherein, * be 0 or 1) be to consider when utilizing synthetic dictionary that the pronunciation literal is carried out phonetic synthesis, because the UL difference of each pronunciation literal, the situation that may occur misfitting when the melody with song meets designs, specify in the UL of logining in the synthetic dictionary by control character " L0 ", specify in the definition again of the UL of logining in the synthetic dictionary by control character " L1 ".That is, not the UL of being logined in the synthetic dictionary in " L1W2S54 ", but specify by " S54 " represented UL.

Control character " W* " (* is 1～5) be for the prosodic sign (' that makes tonal variations, ^, _, $ etc.) use under the situation of the tone amount that changes according to this prosodic sign of change, the variable quantity of " W3 " expression acquiescence, the minimum variable quantity of " W1 " expression, the maximum variable quantity of " W5 " expression.In addition, the “ ﹠amp at end; " be the symbol that the variable quantity that will be produced by prosodic sign changes back to original state, occur up to this symbol, variable quantity is accumulated always.

Above-mentioned symbol C2 has specified with pitch C2, pronunciation for pronunciation literal " ら ", with respect to specified pitch C2 (that is, note in the benchmark octave " De ") just descends by the tone of the represented amount of the numerical value " 4 " of symbol " " and pronounce, then, only rising is by the tone (that is, the pitch that descends for the moment returns to pitch C2) of the amount of numerical value " 4 " expression of symbol " ^4 ".With ensuing symbol "〉2 " the represented amount of numerical value " 2 " reduce volume (for example, the energy level that carries out 2dB reduces).Only 2 times time span with the time span of symbol "-" defined of back prolongs tone period, and then, only descend with symbol "〉" volume (for example, carrying out the 1dB energy level reduces) of the ormal weight of defined.Only 3 times time with the time span of ensuing symbol "-" defined prolongs tone period, and then, only descend with symbol "〉" volume of the ormal weight of defined.As described above, when making tone or volume change pronunciation literal " ら " is pronounced, this is because inferred that it is only articulation type that pitch C2 with the crotchet of velocity of sound 120 comes pronunciation literal " ら " is pronounced.So, even,, have various articulation type also according to user or wright's difference about the HV-Script of same note.

Like this, utilizing song to come pronunciation literal to regulation when pronouncing by HV source of sound 4a and loudspeaker 5a with HV-Script, at first, with the music data that is lower than regulation (promptly, the data of expression note or pitch etc.) pitch of specified pitch (such as C2) begins pronunciation, then, the mode that returns to the pitch of appointment is pronounced.Record and narrate song HV-Script as described above and be because, usually, people's song be when it begins to sing, begin with low a little tone after, pitch is risen and reaching that the mode of desirable pitch pronounces, is to approach singing of people with the audio reproduction of HV-Script and the nature that seems more acoustically in order to make by song.In addition, abundanter for the performance that makes song, also can incorporate the whole bag of tricks.For example, begin pronunciation with pitch specified in the music score at first, but afterwards at once with low a little pitch pronunciation, then, turn back to specified pitch, above-mentioned so various " rising and falling (fluctuations) " pattern can be recorded and narrated in HV-Script at song.

In addition, above-mentioned song HV-Script, except macron "-", also use the control character " S** " of the UL of regulation pronunciation literal, the duration of a sound of the note of the UL of the pronunciation literal that pronounces in the song and appointment is is consistently recorded and narrated.

Fig. 5 represents the song of recording and narrating the as described above various examples of HV-Script.Here, expressed and 120 o'clock the relevant articulation type of pronunciation literal " ら " of velocity of sound (BMP), song has been arranged with HV-Script (that is, constitute) by title, control character and script (theme record portion) by every each note and each pitch regulation.Each song of " filename " expression storage uses " hvs " with the filename of the file of HV-Script as its extension name.In addition, " note No " expression note numbering (that is, note number) corresponding with pitch." title (header) " is a kind of of control character, and expression this document is HV-Script file (that is, the file of recording and narrating with HV-Script), and " control character " and " script (theme) " is as the narration of front.Gui Ding a series of title, control character and script (theme) recorded and narrated in each file like this.

Below, HV song data are described.

For example, with the pairing song data of forming by 1 trifle tune of music score shown in Figure 6, only by pronunciation literal " ら " when generating, as following.

HV#J

L1W2S54

C2$4ら^4>2——>———>&

C2$ら^4>2—>—>&

D2$4ら^4>2—>—>&

S53E2$ら^4>2——>——>———>&

When reproducing above-mentioned HV song data, so that melody shown in Figure 6 sends " ら, ら, ら, ら ... " sound, as so-called nasal sound song, reproduce.

For above-mentioned melody, when generating HV song data, as following when using the actual lyrics.

HV#J

L1W2S54

C2$4お^4>2——>———>&

C2$^4>2—>—>&

D2$4え^4>2—>—>&

S53E2$て^4>2——>——>———>&

In these HV song data, so that melody shown in Figure 6 reproduces " お,, え, て ... " the song of the part of the actual lyrics like this.

In addition, though HV song data are to record and narrate a plurality of songs data with HV-Script side by side, for a plurality of songs with the common control character that uses among the HV-Script, the unified record in the sentence head as described above.Much less, also can record and narrate other control character with HV-Script by each each song.In the above-mentioned example, the initial control character of recording and narrating at the sentence head " L1W2S54 " not only acts on the song HV-Script that it is recorded and narrated later, also ensuing 3 songs are worked with HV-Script, control literal " S** " arranged in that last song is additional in HV-Script, thus, represent that last song is different with HV-Script and its song of recording and narrating previously with the control mode of HV-Script.

In addition, the HV-Script of expression rest represents with the space, and comes regulation stop time length with the numerical value (* *) of the control character of recording and narrating previously " S** " in this space.In addition, there is not sound by being illustrated in this time in placement space, centre.

Also can will generate in advance all velocities of sound, pitch, the duration of a sound and pronunciation literal with HV-Script as above-mentioned song, also can only become basic script in advance, when the song that generates reality is used HV-Script, change the record content of basic script.

For example, the song of only preparing pronunciation literal " ら " is arranged can utilize it to generate the song HV-Script of other pronunciation literal " う " with under the situation of HV-Script.That is, make under the situation that pronunciation literal " う " pronounces according to velocity of sound 120, crotchet and pitch C2, the song usefulness HV-Script of the literal that will pronounce " ら " " C2$4 ら ^4〉2---〉---〉﹠amp; " a part of change and become " C2$4 う ^4〉2---〉---〉﹠amp; ".In the action specification of song transcriber described later, carry out such change and handle.

In addition, also can add trill (vibrato) with HV-Script to song.That is, for original song with HV-Script " HV#J L1W2S53 C2$4 ら ^4〉2---〉---〉---〉---〉﹠amp; ", suitably append the prosodic sign of relevant trill, and become " HV#J L1W2S53 C2$4 ら ^4〉2---〉---〉3-^ 〉--^--^-^﹠amp; ".At this moment, by appending prosodic sign " $ ", " ^ ", the tone during with pronunciation changes up and down and realizes trill.

Prepare the song HV-Script of multiple additional trill as described above, perhaps use other articulation type change performance (variation of volume, the variation of tonequality etc.), make song with having various variations among the HV-Script thus, the expressive force in the time of can strengthening the song reproduction.In addition, for the record content of above-mentioned song with HV-Script, its generation person (slip-stick artist, user etc.) listens to its pronunciation and selects only actual.

Below, explanation is used to generate the song data generating device of HV song data with reference to Fig. 7.

Fig. 7 is the block diagram that is made of functional module of expression song data generating device.

Song data generating device shown in Figure 7 has control module 1b, display unit 2b, operating unit 3b and HV song database (DB) 4b, and then control module 1b is by constituting as the lower part: select input block 1b-1, song to extract unit 1b-2, HV song data generating unit 1b-3, lyrics input block 1b-4 and pronunciation literal out with HV-Script and replace unit 1b-5.

Display unit 2b shows the information of regulation under the control of control module b.Operating unit 3b is made of keyboard or various operating key etc., when recording and narrating song with Japanese and use HV-Script, can use so-called Japanese keyboard (perhaps, assembling the keyboard of Japanese conversion software in common keyboard).User's operating operation unit 3b imports predetermined data in control module 1b.4b is such as the aforementioned for HV song database.

Select input block 1b-1 in display unit 2b, to show to be used to select symbol, literal or the button etc. of the regulation of velocity of sound, note (comprising rest) and pitch, select by the user.Note length) and the information of its pitch (tone pitch) and the music data formed is input to selection input block 1b-1 like this, the expression note of selecting by the user (that is note length:.Such as, in display unit 2b, show the pictograph (or button) of the various notes of expression (crotchet, quaver etc.) and represent pitch symbol C2, E3 ... Deng, the user therefrom suitably selects desirable note and pitch etc., thus, carry out the input operation of desirable music data.

Song is extracted unit 1b-2 out with HV-Script and extract the song HV-Script that is equivalent to select the music data (note, pitch etc.) imported in the input block 1b-1 out from HV song database 4b.

HV song data generating unit 1b-3 corresponding to the order of the note of the music data of user input, extracts song out song that unit 1b-2 extracts out with HV-Script and is arranged in order and generates HV song data with HV-Script from HV song database 4b.

Lyrics input block 1b-4 when the lyrics are imported, shows the lyrics input picture (not shown) of regulation in display unit 2b, and carries out corresponding to being handled by the lyrics input of the operation of user's operating unit 3b.Thus, lyrics input block 1b-4 receives the input of the lyrics data (text-string) that distributes in the HV song data.

Unit 1b-5 replaced in the pronunciation literal, and the pronunciation literal that contains in the HV song data with HV song data generating unit 1b-3 generation replaces to the character string that constitutes the lyrics that are input to lyrics input block 1b-4.

HV song database 4b stores the HV-Script data that comprise prosodic sign, this prosodic sign is kind and the pitch that is used for by each velocity of sound, note, and pitch and the duration of a sound of pronunciation literal with regulation pronounced.

In addition, control module 1b will be used to realize that by storer and CPU formations such as (central processing units) each functional programs load memory and execution realize this function.

Constitute the song generating apparatus of present embodiment by above-mentioned song data generating device and above-mentioned song transcriber, the user generates desirable song data, and can reproduce it.

Next, the action to the song generating apparatus relevant with present embodiment describes with reference to Fig. 8 and Fig. 9.

At first, by selecting input block 1b-1 in display unit 2b, to show velocity of sound input picture (step S101).When user's operating operation unit 3b also imported the velocity of sound of regulation, it was that flow process is transferred to step S103 that the result of determination of step S102 becomes.Among the step S103, in the HV song data field of the internal storage of control module 1b, set title (HV#J).

Then, select input block 1b-1 in display unit 2b, to show note input picture (step S104).When user's operating operation unit 3b selects and when the note of input regulation and pitch etc., it is that flow process is transferred to step S106 that the result of determination of step S105 becomes.For example, the music score that the user is recorded and narrated with reference to the spectrum face is being seen the displaying contents that is used to select note and pitch that shows in display unit 2b, and operating operation unit 3b also selects desirable note and pitch successively, thus, carries out the input of music data.

In step S106, song is used among the HV-Script with a plurality of songs relevant with authentic language (for example " ら ") that HV-Script extraction unit 1b-2 stores from HV song database 4b, selects also to extract out the song HV-Script that is consistent with the music data of being imported by the user.Because HV song database 4b stores song HV-Script file by kind, the pitch of each velocity of sound, note (comprising rest), so can select and song HV-Script data that extraction is relevant with velocity of sound, note and the pitch of appointment.

The song of extracting out is followed previous data of extracting out with the HV-Script data and is stored in HV song data area, then, flow process turns back to S105.After this, selected successively and extract out in the HV song database by the note of user input and the pairing song of pitch with the HV-Script data, in HV song data area, be arranged in order and store.According to the song data of such generation,, pronounce with the pronunciation literal (for example " ら ") of desirable melody to regulation by HV song player 1a.

In addition, when the user no longer imports note and pitch, when carrying out the operation of note end of input, the result of determination of step S105 becomes not, and then the result of determination of Next step S107 becomes and is, so flow process is transferred to step S108.Among the step S108, have or not the judgement of the operation of lyrics input beginning.At this, do not carry out the operation of lyrics input beginning, and when carrying out the audiovisual operation, the result of determination of step S108 becomes not, the result of determination of following step S109 becomes and is, so flow process is transferred to step S110.

Among the step S110,, carry out the processing of HV song data conversion, and be sent to HV source of sound 4a for resonance peak frame row by HV song player 1a and HV driver 2a.HV source of sound 4a is listed as according to this resonance peak frame voice signal is outputed to loudspeaker 5a.Like this, from loudspeaker 5a, reproduce desirable song and pronouncing.

Among the above-mentioned steps S108, when carrying out the operation of lyrics input beginning, flow process is transferred to step S111 (with reference to Fig. 9).Among the step S111, in display unit 2b, show lyrics input picture by lyrics input block 1b-4.Then, carry out specifying the operation of the pronunciation literal of distribution provisions in the note at music data by the user.When lyrics end of input, it is that flow process is transferred to step S113 that the result of determination of step S112 becomes.Among the step S113, the pronunciation literal of the HV song data that before generated is replaced to successively the literal of the lyrics of input.

That is, unit 1b-5 replaced in the pronunciation literal, with the pronunciation literal that comprises in the HV song data that before generated (for example " ら ", " ら ", " ら ", " ら " ...) replace to the formation lyrics of importing among the lyrics input block 1b-4 literal (for example " お ", " ", " え ", " て " ...).According to the HV song data that generate in this stage, HV song player can reproduce song, this song be will input the song that pronounces with desirable melody of the lyrics.

Among the last step S114, in the storer of HV song data storage in control module 1b of replacing.Like this, song generates to handle and just is through with.

As described above, in the present embodiment, will with the kind (or duration of a sound) of velocity of sound, note rest and pitch (do, re, mi ...) song of corresponding pronunciation ingredient with HV-Script with the pronunciation literal of regulation (for example " あ ", " い " ...) prearrange.So, the user operates the song data generating device and selects desirable note and pitch successively so that write out the music score of the subsidiary lyrics, and thus, with in accordance with regulations series arrangement, HV song data generate song automatically with HV-Script.

In addition, in the present embodiment,, store song HV-Script one by one, also can after the end of input of all notes, store song HV-Script uniformly though press the input of each note.In addition, for the moment of carrying out audiovisual, the user can at random set.In addition, though after the end of input of all literal of the lyrics, just replace to song HV-Script, also can song be replaced to corresponding pronunciation literal with HV-Script by the input of a literal in the lyrics.

In addition, when song is used the variation that adds trill and other articulation type among the HV-Script, in aforesaid process flow diagram, can append the processing that is used to select this articulation type.For example, under the situation of song, append " V " such literal in the back of and then importing note, also can automatically carry out " trill is arranged " such setting thus this note with additional trill among the HV-Script.

The HV song data of Sheng Chenging are reproduced by HV song player 1a as described above.In addition, reproduce by HV song player 1a too for the processing of the audiovisual among the step S110.

Next, with reference to the action of the flowchart text HV song player 1a of Figure 10.That is the reproduction that receives by the user begins indication, and the relevant processing of explanation of execution and HV song data.

HV song player 1a, the explanation of the pronunciation character string that beginning is made up of with HV-Script the song of recording and narrating as HV song data.Here, HV song player 1a outputs to HV driver 2a (step S201) with the song that contains in the HV song data successively with HV-Script (but except title).

Receive the HV driver 2a of pronunciation character string,, this character string is transformed to resonance peak frame row with reference to the synthetic dictionary that is stored among the synthetic dictionary memory 3a.In addition, corresponding to the prosodic sign that comprises in this pronunciation character string, the change of stipulating is appended to resonance peak frame row and exports (step S202) to HV source of sound 4a.

HV source of sound 4a carries out phonetic synthesis and generates voice signal according to the resonance peak frame row of being supplied with by HV driver 2a, outputs to loudspeaker 5a (step S203).Like this, loudspeaker 5a reproduces the song of phonetic synthesis and pronounces.

Afterwards, detect the last of HV song data up to judgement by step S204, HV song player 1a repeated execution of steps S201～step S203, step S204 finished and the relevant processing of HV song data interpretation in the last moment that detects HV song data.

In addition, be an example as Fig. 8 to the content of process flow diagram shown in Figure 10, the present invention is not limited to the treatment scheme of present embodiment.

As above illustrated, in the song generating apparatus relevant with present embodiment, the user watches the music score and the lyrics on one side, select input note and pitch on one side, only need to select itself and with the combination of pronunciation literal, just can generate the used again data of song (that is HV song data) thus easily, therefore compare than the original labor capacity that generates the HV-Script that records and narrates by text from the beginning, can not time-consumingly simply just realize that with the labour song reproduces.

Next, in pocket telephone, be suitable for the formation example of the situation of the song generating apparatus relevant with present embodiment with reference to Figure 11 explanation.

Figure 11 is the block diagram of formation of the pocket telephone of expression function that the song generating apparatus relevant with present embodiment is housed.

Among Figure 11, the CPU of the circuit block in the symbol 21 expression control pocket telephones etc.Symbol 22 expressions send the antenna that receives usefulness with the outside data of carrying out.Symbol 23 expression communication units, modulation send with data and to antenna 22 outputs, the reception data that demodulation simultaneously receives by antenna 22.Symbol 24 expression sound processing units, with when conversation from communication unit 23 outputs from partner (, the pocket telephones of partner etc.) voice data of Fa Songing is transformed to voice signal and to earphone (ear speaker, not shown) output, perhaps will be transformed to voice data by the voice signal that microphone (microphone, not shown) picks up and to communication unit 23 output.

Symbol 25 expression sources of sound.This source of sound 25 has the function of reading in music data and reproducing melody, for example, reproduces desirable incoming call tune during incoming call.In addition, source of sound 25 has and the same function of HV source of sound 4a shown in Figure 1.Symbol 26 expression loudspeakers pronounce to song or musical sound etc.Symbol 27 expressions receive the input block by user's operation, are made of value key or function key etc.The RAM of symbol 28 expression storage HV song data or music data etc.By radio communication when Web server is downloaded melody phrase data, this melody phrase data storage is in RAM28.Symbol 29 expression storages are by the ROM of various programs, aforesaid synthetic dictionary and the HV song database etc. of CPU21 execution.Symbol 30 expression display units show and relevant information such as state by user's content of operation or pocket telephone.In addition, each above-mentioned parts interconnect by bus.

CPU21 carries out institute's program stored among the ROM29, thus, realizes the function action of HV song player 1a, HV driver 2a shown in Figure 1 and control module 1b shown in Figure 1.In addition, the HV song data storage that CPU21 will generate as described above like that when receiving by user's reproduction indication, is read this HV song data from RAM28 in RAM28, and it is recorded and narrated content makes an explanation.At this moment, CPU21 is resonance peak frame row with reference to the synthetic dictionary of being stored among the ROM29 with the data conversion of HV song, to source of sound 25 outputs.

Source of sound 25 generates voice signal according to the resonance peak frame row of supplying with from CPU21, outputs to loudspeaker 26.In addition, under the control of CPU21, also can generate note signal, output to loudspeaker 26 according to the music data of reading from RAM28.Loudspeaker 26 pronounces to sound (song) or musical sound according to voice signal or note signal.

The user can operate input block 27, and startup realizes the software of the function of the control module 1b in the aforesaid song data generating device, in the displaying contents of visuognosis display unit 30, select input music data (note and pitch etc.) and generate HV song data.In addition, also the HV song data that generate can be saved among the RAM28.

In addition, also the HV song data that generate can be used as the incoming call tune.Action under this situation is as described below.

Use this situation of HV song data to be stored among the RAM28 in advance during at first, with incoming call as set information.Communication unit 23 when receiving by antenna 22 from call information that other pocket telephone etc. sends, notice CPU21 incoming call.Receive the CPU21 of call-in reporting, read set information, from RAM28, read the HV song data of this set information appointment, and begin this explanation from RAM28.Later action according to the record content of the HV song data of reading, is carried out the pronunciation of song from loudspeaker 26 as previously described.

The user also can comprise HV song data and send to other terminal in Email.For example, also can in the appended document (that is, extension name (hvs) identification that for example can be according to the rules comprises the appended document of HV song data) of regulation, record and narrate HV song data, and add in the Email of transmission.When receiving such Email by pocket telephone shown in Figure 11, CPU21 explains the content of appended document that the reproduction indication according to by the user sends these HV song data to source of sound 25 as HV song data.

In addition, the function of HV song player 1a and HV driver 2a need not be loaded among the CPU21 (perhaps the program carried out of CPU21 etc.).In this case, can make CPU21 and source of sound 25 total above-mentioned functions, perhaps also can make source of sound 25 load above-mentioned functions arbitrarily.In addition, the applicable object of song generating apparatus related to the present invention is not limited to pocket telephone, to the function that also can load the song generating apparatus in the various portable terminal device such as aforesaid PHS or PDA.

In addition, carry out but also the functional programs of specializing HV song player 1a shown in Figure 1 and HV driver 2a and control module 1b shown in Figure 7 can be read in the computer system of phonetic synthesis, thereby realize utilizing the song of HV-Script to generate.

And above-mentioned " computer system " not only means the hardware of the computer installation main body that is mounted with microprocessor, also means to comprise OS (Operating System; Operating system) generalized concept of the hardware of etc. software or peripheral instrument etc.

In addition, also said procedure can be stored in the memory storage etc. of computer system, after reading, be sent to other computer system via the transfer medium of regulation.Here, as the transfer medium that is used for convey program, refer to the communication line medium such, that have the function of transmission information of the network (communication network) of internet etc. or telephone line etc.Even, be not limited to wiredly, also can come convey program by radio communication.

Said procedure needs not be the program of the major function of whole realization song generating apparatus, also can be only to realize the wherein functional programs of a part.In addition, above-mentioned program also can be, by and computer system in combination between the program of both having deposited of having packed into realize the program of form function, so-called difference program (or differential file) of song generating apparatus.

As mentioned above, embodiments of the invention are illustrated, but formation of the present invention and action are not limited to present embodiment, also comprise the change in the scope that does not exceed main idea of the present invention with reference to accompanying drawing.For example, following change also is possible.

(1), also can import desirable music data (MIDI data etc.), therefrom the extraction information relevant with note and pitch for the input of note.For example, constituting under the situation of melody, also can from this tune part, optionally extract note information out by a plurality of parts that comprise tune.

(2) reproduce under the situation of song, also can also reproduce simultaneously, realize that the song that has accompaniment reproduces by the accompaniment part.When as described in above-mentioned (1), importing music data, also can only partly carry out noise reduction, and residual part and song are reproduced simultaneously this tune.

(3) in the above embodiments,, also can only prepare song HV-Script data, and be stored in the HV song database at specific velocity of sound though prepare with the HV-Script data by each song.At this moment, for other velocity of sound, can generate automatically with the HV-Script data according to the song relevant with specific velocity of sound.In addition, the song HV-Script data of velocity of sound=120 have been described in the above embodiments, at this moment, the duration of a sound of crotchet is 0.5 second, and in addition, under the situation of velocity of sound=60, the duration of a sound of crotchet becomes 1 second.That is, velocity of sound becomes a half, and tone period becomes twice.

(4) in addition, song is recorded and narrated content with data based its of HV-Script and is determined tone period uniquely, therefore when the song according to velocity of sound=120 generates the song usefulness HV-Script data of velocity of sound=60 with the HV-Script data, tone period becomes 2 times, so or the numerical value (* *) of change control character " S** ", or add long "-".Thus, can generate the pairing song of the crotchet HV-Script data of velocity of sound=60 o'clock automatically.Can stipulate in addition that the song that such time span that recently changes pronunciation corresponding to velocity of sound is used changes rule with the HV-Script record.

Claims

1. a song generating apparatus is characterized in that,

Have:

Input media, its input are illustrated in the music data of the tune that uses in the song reproduction;

Memory storage, a plurality of pronunciation datas of the character string that comprises pronunciation literal, first symbol, second symbol are recorded and narrated in its storage with textual form, wherein, the literal that the pronunciation textual representation should be pronounced as song, first symbol is indicated during the pronunciation of aforementioned pronunciation literal, the tone when aforementioned pronunciation literal is pronounced in the indication of second symbol or the variation of volume;

Control device, it sets up corresponding relation by the desirable pronunciation data that will select and each note that constitutes aforementioned music data from aforementioned a plurality of pronunciation datas that aforementioned memory storage is stored, thereby generates the pronunciation data row;

The song transcriber, it is listed as according to aforementioned pronunciation data, and with the represented aforementioned melody of aforementioned music data, reproduce with aforementioned each note and set up the represented song of pronunciation literal in each pronunciation data of corresponding relation,

In aforementioned song transcriber, in during the pronunciation indicated by aforementioned first symbol in the aforementioned pronunciation data, making the mode of tone or volume change based on aforementioned second symbol in the aforementioned pronunciation data, reproduce with aforementioned each note and set up the represented song of pronunciation literal in each pronunciation data of corresponding relation.

2. song generating apparatus according to claim 1, it is characterized in that, aforementioned input media can be imported desirable literal, aforementioned control device is used by the desirable literal replacement of aforementioned input media input and has been set up the pronunciation literal that is comprised in each pronunciation data of corresponding relation with each note, aforementioned song transcriber is with the represented aforementioned melody of aforementioned music data, reproduces with aforementioned each note and set up the represented song of literal in each pronunciation data of corresponding relation, that be replaced by.

3. song generating apparatus according to claim 1 is characterized in that: the pronunciation data that aforementioned memory storage is stored is stored corresponding to each velocity of sound; Aforementioned control device, under the different situation of the pairing velocity of sound of in the velocity of sound of aforementioned music data and aforementioned memory storage, storing of pronunciation data, ratio according to the pairing velocity of sound of storing in the velocity of sound of aforementioned music data and the aforementioned memory storage of pronunciation data, change aforementioned first symbol in the aforementioned pronunciation data, in view of the above, generate aforementioned pronunciation data row in the corresponding to mode of velocity of sound with the aforementioned music data of importing.

4. song generating apparatus according to claim 1, it is characterized in that, aforementioned pronunciation data is recorded and narrated as follows, promptly when the literal that should pronounce pronounces, in the zero hour of this pronunciation, with than begin pronunciation by the low a little pitch of the pitch of aforementioned music data appointment, after this, return to the pitch of appointment.

5. song generating apparatus according to claim 1 is characterized in that aforementioned memory storage also stores the pronunciation data that trill is used, and the pronunciation data that this trill is used is recorded and narrated by the mode that in the pronunciation way of pronunciation literal pitch is changed up and down.

6. a portable mobile terminal is characterized in that,

Have:

The song transcriber, it is listed as according to aforementioned pronunciation data, with the represented aforementioned melody of aforementioned music data; Reproduce with aforementioned each note and set up the represented song of pronunciation literal in each pronunciation data of corresponding relation,

7. a song reproducting method is characterized in that,

Have following four steps:

Input is illustrated in the music data of the tune that uses in the song reproduction;

From a plurality of pronunciation datas of recording and narrating the character string that comprises pronunciation literal, first symbol, second symbol with textual form, select the desirable pronunciation data that is used for setting up corresponding relation with each note that constitutes aforementioned music data, wherein, the literal that the pronunciation textual representation should be pronounced as song, first symbol is indicated during the pronunciation of aforementioned pronunciation literal, the tone when aforementioned pronunciation literal is pronounced in the indication of second symbol or the variation of volume;

Set up corresponding relation by the desirable pronunciation data that will from aforementioned a plurality of pronunciation datas that aforementioned memory storage is stored, select and each note that constitutes aforementioned music data, thereby generate the pronunciation data row;

According to aforementioned pronunciation data row, with the represented aforementioned melody of aforementioned music data, reproduce with aforementioned each note and set up the represented song of pronunciation literal in each pronunciation data of corresponding relation,

In the step of carrying out aforementioned reproduction, in during the pronunciation indicated by aforementioned first symbol in the aforementioned pronunciation data, making the mode of tone or volume change based on aforementioned second symbol in the aforementioned pronunciation data, reproduce with aforementioned each note and set up the represented song of pronunciation literal in each pronunciation data of corresponding relation.