WO2004111993A1 - Signal combination method and device, singing voice synthesizing method and device, program and recording medium, and robot device - Google Patents

Signal combination method and device, singing voice synthesizing method and device, program and recording medium, and robot device Download PDF

Info

Publication number
WO2004111993A1
WO2004111993A1 PCT/JP2004/008333 JP2004008333W WO2004111993A1 WO 2004111993 A1 WO2004111993 A1 WO 2004111993A1 JP 2004008333 W JP2004008333 W JP 2004008333W WO 2004111993 A1 WO2004111993 A1 WO 2004111993A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
singing
singing voice
change
pitch
Prior art date
Application number
PCT/JP2004/008333
Other languages
French (fr)
Japanese (ja)
Inventor
Kenichiro Kobayashi
Original Assignee
Sony Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sony Corporation filed Critical Sony Corporation
Publication of WO2004111993A1 publication Critical patent/WO2004111993A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0033Recording/reproducing or transmission of music for electrophonic musical instruments
    • G10H1/0041Recording/reproducing or transmission of music for electrophonic musical instruments in coded form
    • G10H1/0058Transmission between separate instruments or between individual components of a musical system
    • G10H1/0066Transmission between separate instruments or between individual components of a musical system using a MIDI interface
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/315Sound category-dependent sound synthesis processes [Gensound] for musical use; Sound category-specific synthesis-controlling parameters or control means therefor
    • G10H2250/455Gensound singing voices, i.e. generation of human voices for musical applications, vocal singing sounds or intelligible words at a desired pitch or with desired vocal effects, e.g. by phoneme synthesis

Definitions

  • the present invention relates to a signal synthesizing method and apparatus for synthesizing a signal such as a singing voice or a musical tone from performance data.
  • the present invention relates to a singing voice synthesizing method and apparatus, a program and a recording medium, and a robot apparatus.
  • MIDI (musical instrument digital interface) data is representative performance data and is a practical industry standard.
  • MIDI data is used to generate a musical tone by controlling a digital sound source called a MIDI sound source (a sound source operated by MIDI data, such as a computer sound source or an electronic musical instrument sound source).
  • a MIDI file eg, SMF (standard MIDI file)
  • SMF standard MIDI file
  • the power to express a singing voice in the data format of MIDI data is a control just as if controlling a musical instrument.
  • Robots a mechanical device that performs a motion similar to the motion of a human (living organism) using an electric or magnetic action is called a “robot”.
  • Robots began to spread in Japan in the late 1960s, but most of them were industrial robots such as manipulators and transfer robots for the purpose of automation of production work in factories and unmanned operations. Met.
  • AI Artificial intelligence
  • visual expression means as a means for expressing artificial intelligence to the outside, natural language expression means, and the like, the use of speech is an example of a natural language expression function.
  • the conventional singing voice synthesis uses data of a special format, and even if MIDI data is used, the lyrics data embedded in the data cannot be effectively used. Singing MIDI data created for other instruments could not.
  • the present invention has been proposed in view of such a conventional situation. For example, it is possible to synthesize a singing voice using performance data such as MIDI data. It is an object of the present invention to provide a method and an apparatus for synthesizing a singing voice or a musical sound, and a singing voice synthesizing method and an apparatus, which enable expression in consideration of a style of a musical sound.
  • the method and apparatus for synthesizing a singing voice or a musical tone analyze performance data as musical information of pitch, length and lyrics, and analyze the analyzed musical information.
  • the singing or performance pattern is changed by giving an expression change including at least one of a volume change, a pitch change, and a timing change to the notes in the note sequence according to the singing or performance style. And generating a singing voice or a musical tone based on the musical note sequence of the music information.
  • the singing voice synthesizing method and apparatus achieves the above object by reducing the volume change, the pitch change, and the timing change with respect to the music information note according to the singing style.
  • pattern data in which parameters for giving expression changes including at least one are set are prepared in advance, and the input performance data is analyzed as the musical information of the pitch, length, and lyrics.
  • lyrics are added to the note sequence to make singing voice information.
  • the singing voice information is added to the pattern data prepared in advance in correspondence with the notes of the note sequence of the analyzed music information.
  • a singing voice is generated based on a musical note sequence of the music information whose pattern has been changed.
  • an expression change including at least one of a volume change, a pitch change, and a timing change is given according to a specified singing style, and the singing style is changed.
  • the performance data is performance data of a MIDI file.
  • the parameter for giving the expression change is set according to the singing style and at least one of the note length, strength, strength increase / decrease state, height, and music speed. It is mentioned.
  • the above-mentioned expression change includes adding at least one of vibrato, pitch bend, and expression to the sound of the target note.
  • the parameter for giving the vibrato includes at least one of information on delay of amplitude start, information on amplitude, information on cycle, information on increase / decrease in amplitude, and information on increase / decrease in cycle.
  • the parameter for assigning the ethasplayion may include at least one of time information of a ratio to a note length and information of strength at a characteristic arbitrary point on the time axis.
  • the singing style may be selected based on the user setting, the track name of the performance data, the song name, the marker, or the deviation.
  • the program according to the present invention causes a computer to execute the singing voice synthesizing function of the present invention
  • the recording medium according to the present invention is a computer-readable medium storing the program. is there.
  • the robot apparatus is an autonomous robot apparatus that operates based on supplied input information, and includes music information in accordance with a singing style.
  • Storage means for storing pattern data in which parameters for giving an expression change including at least one of a volume change, a pitch change, and a timing change are stored, and the performance data is stored at a pitch.
  • Analyzing means for analyzing the music information of the music information, length, and lyrics, lyric providing means for providing lyric information to the note sequence based on the lyric information of the analyzed music information, and analyzing by the analyzing means.
  • the singing is performed by giving an expression change including at least one of a volume change, a pitch change, and a timing change read out by the storage means in accordance with the musical note of the musical note sequence of the music information.
  • Singing pattern of voice information And singing voice generating means for generating a singing voice based on the musical note sequence of the music information whose pattern has been changed.
  • the performance data is converted to the pitch, length, and lyrics. It is analyzed as music information, and an expression change including at least one of a volume change, a pitch change, and a timing change is given to the notes in the note sequence of the analyzed music information according to a singing or performance style. Singing or playing pattern by changing the singing or performance pattern and generating a singing voice or musical tone based on the musical note sequence of the pattern-changed music information. Expression change according to the style of music can be given, and the music expression can be greatly improved.
  • an expression change including at least one of a volume change, a pitch change, and a timing change is given to a note of music information according to a singing style.
  • Pattern data in which parameters for setting are provided in advance is prepared, and the input performance data is analyzed as music information of pitch, length, and lyrics, and based on the lyrics information of the analyzed music information. Lyrics are added to the note sequence to produce singing voice information, and the volume change, pitch change, and timing change are performed based on the previously prepared pattern data corresponding to the notes in the note sequence of the analyzed music information.
  • a change in expression according to the singing style can be given to the singing voice at the time of the singing, and the musical expression is significantly improved. Therefore, while the conventional singing style was limited to singing with poor expressive power, arbitrarily selecting a singing style improved the expressive power, and also achieved a singing style adapted to the music. Can realize a more natural singing voice, and can express humor with a mismatched style, further improving entertainment. it can.
  • the program according to the present invention causes a computer to execute the singing voice synthesizing function of the present invention
  • the recording medium according to the present invention is a computer-readable recording medium on which the program is recorded. is there.
  • the robot apparatus implements the singing voice synthesizing function of the present invention. That is, according to the robot apparatus of the present invention, a parameter for giving an expression change including at least one of a volume change, a pitch change, and a timing change to a note of music information is set according to a singing style. Prepared pattern data is prepared in advance, and the input performance data is analyzed as musical information of the pitch, length, and lyrics. Based on the analyzed lyrics information of the musical information, the lyrics for the note sequence are analyzed. Is added to the singing voice information, and includes at least one of a volume change, a pitch change, and a timing change based on the pattern data prepared in advance, corresponding to the notes in the note sequence of the analyzed music information.
  • the singing voice at the time of singing is changed by changing the singing pattern of the singing information by giving a change in expression, and generating the singing voice based on the note sequence of the music information whose pattern has been changed.
  • Changes in the expression according to the singing style can be applied to expand the musical expression, achieve a natural singing voice with a singing style that matches the music, and express humor with a mismatched style It is possible to further improve the entertainment. Therefore, the expression ability of the robot device is improved, and the entertainment property can be improved, and the intimacy with humans can be deepened.
  • FIG. 1 is a block diagram illustrating a system configuration of a singing voice synthesizing apparatus according to the present embodiment.
  • FIG. 2 is a diagram showing an example of score information as an analysis result.
  • FIG. 3 is a diagram showing an example of singing voice information.
  • FIG. 4 is a block diagram illustrating a configuration example of a singing voice generation unit.
  • FIG. 5 is a diagram showing an example of singing pattern data.
  • FIG. 6 is a diagram showing an example of singing voice information before applying a singing style.
  • FIG. 7 shows a singing voice after the singing style “Enka” is applied to the singing voice information of FIG. It is a figure showing information.
  • FIG. 8 is a block diagram showing a main part of another configuration example of the singing voice synthesizing apparatus according to the present embodiment.
  • FIG. 9 is a flowchart illustrating the operation of the singing voice synthesizing apparatus according to the present embodiment.
  • FIG. 10 is a perspective view showing an external configuration of a robot device according to the present embodiment.
  • FIG. 11 is a diagram schematically showing a degree of freedom configuration model of the robot device.
  • FIG. 12 is a block diagram showing a system configuration of the robot device.
  • a singing voice synthesizing apparatus that mainly synthesizes a singing voice and further has a tone synthesizing function that also has a function of synthesizing a musical tone is shown.
  • the present invention can be easily applied to a singing voice synthesizing device for synthesizing only singing voices, a tone synthesizing device for synthesizing musical tones, or a signal synthesizing device for synthesizing audio signals such as singing voices and musical tones.
  • FIG. 1 is a block diagram showing a schematic system configuration of a singing voice synthesizing apparatus with a musical sound synthesizing function according to the present embodiment.
  • the singing voice synthesizing device shown in FIG. 1 is assumed to be applied to, for example, a robot device having at least an emotion model, a voice synthesizing unit, and a sound generating unit, but is not limited thereto.
  • a robot device having at least an emotion model, a voice synthesizing unit, and a sound generating unit, but is not limited thereto.
  • AI artificial intelligence
  • a performance data analysis unit analyzes performance data 1 represented by MIDI data.
  • musical score information 4 representing the pitch, length and intensity of the tracks and channels in the performance data.
  • FIG. 2 shows an example of performance data (MIDI data) converted into musical score information 4.
  • events are written for each track and each channel.
  • Events include note events and control events.
  • the time of the event is IJ (time column in the figure), height, It has information on length and strength. Therefore, a note sequence or a sound sequence is defined by a sequence of note events.
  • the control event has a time of occurrence, control type data (for example, vibrato, performance dynamics expression) and data indicating the control port.
  • control contents include ⁇ depth '' indicating the magnitude of the sound swing, ⁇ width '' indicating the cycle of the sound swing, and the start timing of the sound swing (delay from the sounding timing).
  • Control events for a specific track or channel are applied to the playback of the note sequence of that track or channel, unless a new control event (control change) occurs for that control type.
  • lyrics can be entered for each track in the performance data of a MIDI file.
  • “Uruhi” shown at the top is a part of the lyrics written on track 1
  • “Uruhi” shown at the bottom is a part of the lyrics written on track 2. That is, the example of FIG. 2 is an example in which lyrics are embedded in the analyzed music information (music score information).
  • time is represented by “measures: beats: number of ticks”
  • length is represented by “number of ticks”
  • strength is represented by numerical values of “0-127”
  • height is represented by 440Hz is represented by "A4".
  • the depth, width, and delay are each expressed as a number from "0-64-127”.
  • the converted score information 4 is passed to the lyrics providing unit 5.
  • the lyric imparting unit 5 generates singing voice information 6 to which the lyrics for the sound are attached along with information such as the length, pitch, intensity, and expression of the sound corresponding to the note, based on the musical score information 4.
  • FIG. 3 shows an example of the singing voice information 6.
  • “ ⁇ song ⁇ ” is a tag indicating the start of lyrics information.
  • the tag “ ⁇ PP, T10673075 ⁇ ” indicates a break of 10673075 x sec
  • the tag “ ⁇ tdyna 110 649075 ⁇ ” indicates the head force, the overall strength of 10673075 ⁇ sec
  • the tag “ ⁇ dyna 100 ⁇ ” indicates the strength of each sound
  • the tag “ ⁇ G4, T288461 ⁇ ⁇ ” indicates the height of G4 and the length of 288461 ⁇ sec lyrics “A”.
  • the singing voice information in Fig. 3 is the score information shown in Fig. 2. (Analysis result of MIDI data).
  • performance data for example, note information
  • note information for musical instrument control is sufficiently utilized in generating singing voice information.
  • the score information (Fig. 2) for the occurrence time, length, height, strength, etc. of the sound of “a” that is a singing attribute other than “a”
  • the time of occurrence, length, height, strength, etc. included in the control information and note event information are directly used, and the next lyric element ⁇ ru '' is also used in the same track and channel in the score information.
  • the next note event information is used directly, and so on.
  • the singing voice information 6 is passed to the singing voice generating unit 7, and the singing voice information is
  • a singing voice waveform 8 is generated based on 6.
  • the singing voice generator 7 that generates the singing voice waveform 8 from the singing voice information 6 is configured as shown in FIG. 4, for example.
  • the singing voice prosody generation unit 7-1 converts the singing voice information 6 into singing voice prosody data.
  • the waveform generator 7-2 converts the singing voice prosody data into a singing voice waveform 8.
  • [LABEL] indicates the duration of each phoneme.
  • the phoneme “ra” phoneme segment
  • the first phoneme “aa” following “ra” has a power of 1000 samples 39 600 samples Up to 38,600 samples in duration.
  • [PITCH] is a pitch cycle represented by a point pitch. That is, the pitch period at the 0 sample point is 50 samples. In this case, the pitch of 50 samples is applied to all samples because the height of the “ra” is not changed.
  • [VOLUME] indicates the relative volume at each sample point. That is, when the default value is 100%, the volume is 66% at the 0 sample point and 57% at the 39600 sample point. Similarly, at the 40100 sample point, 48% of the volume continues, and at the 42600 sample point, the volume becomes 3%. You. This realizes that the sound of “La” attenuates with the passage of time.
  • the pitch cycle fluctuates up and down (50 ⁇ 3) with a cycle (width) of about 4000 sample lengths, such as 53 pitch cycles. This implements vibrato, which is a fluctuation in the pitch of the voice.
  • the waveform generator 7-2 reads out a sample from an internal waveform memory (not shown) based on such singing voice / phonological data and generates a singing voice waveform 8.
  • the singing voice generator 7 that generates the singing voice waveform 8 from the singing voice information 6 is not limited to the above example, and any appropriate known singing voice generator can be used.
  • the performance data 1 is passed to the MIDI sound source 9, and the MIDI sound source 9 generates a musical tone based on the performance data.
  • This musical tone has an accompaniment waveform 10.
  • the singing voice waveform 8 and the accompaniment waveform 10 are both passed to a mixing unit 11 that performs synchronization and mixing.
  • the mixing section 11 synchronizes the singing voice waveform 8 and the accompaniment waveform 10 and superimposes them on each other and reproduces them as the output waveform 3, so that music reproduction using the singing voice accompanied by the accompaniment based on the performance data 1. I do.
  • FIG. 2 shows an example of the musical score information 4 to which lyrics are added
  • FIG. 3 shows an example of the singing voice information 6 generated from the musical score information 4 of FIG.
  • the singing style is specified by the operator when generating the singing voice
  • the musical score information 4 is converted into the singing voice information 6
  • the music information described in the musical score information 4 is Is passed to the singing pattern changing unit 12.
  • the singing pattern data 13 (singing style data and Also say. ) Is compared with the musical score information 4 to refer to the singing pattern data 13 matching the specified singing style, and singing is performed for the sound (note) of the musical score information 4 matching the conditions described therein.
  • the singing voice information 6 is generated by adding the parameters of the singing pattern described in the pattern data 13. More specifically, for a predetermined note (note) in a musical note sequence in a musical score information, a volume change such as vibrato, expression, timing, pitch bend, a pitch change, and a timing change are included. The parameters for giving the expression change are set, and these parameters are stored in the storage means as singing pattern data 13 (singing style data).
  • the singing pattern changing unit 12 stores the score information 4 and the singing pattern data. Using the data 13, the singing voice information 6 modified according to the singing style is generated.
  • FIG. 5 is a diagram showing a specific example of singing pattern data 13 (singing style data) corresponding to each singing style.
  • the singing pattern data 13 is divided into two parts, a condition part and an execution part.
  • the items of the condition part include singing styles such as “popular”, “classic”, and “enka”, and
  • the pitch, length, strength, strength increase / decrease pattern, tempo of the music, etc. which are the conditions for selecting the sound (note) to be given the expression change, are included in the execution unit.
  • Vibrato as parameters of the expression change to be applied to the sound (note) conforming to the condition described in the condition section expression (expression: dynamics of sound, performance dynamics expression), timing, pitch bend (phrase Head, pitch end of phrase), pitch adjustment, etc. are included.
  • the volume parameter is specified at some points that are characteristic points, such as the beginning, end, and a large change point when the time from the beginning to the end of the sound is 100.
  • a parameter indicating the degree of delay or advance relative to the beat is specified.
  • Pitch bend is the degree to which the pitch is raised or lowered when the pitch is raised or lowered for the sound at the beginning or end of the phrase.
  • a parameter expressed in cents is specified. It does not apply to sounds in phrases.
  • pitch adjustment the parameter of the number of cents when raising or lowering the entire pitch is specified. Where cent is 100 It is a unit of pitch width that represents a semitone in cents.
  • FIGS. 6 and 7 show examples of application of this singing style (examples of giving singing pattern data parameters).
  • Fig. 6 shows the singing voice information before the application of the singing style.For the part ptA surrounded by the broken line in Fig. 6, for example, the singing voice information after each parameter of the singing pattern data of the singing style of "enka” is applied. This is indicated by ptB enclosed by the broken line in FIG.
  • FIGS. 6 and 7 for example, as shown in FIG. 7, for the lyric “hi” (note) “E4, T144231” of the lyrics of the singing voice information in FIG.
  • Expression change by parameters such as pitch bend, end-of-phrase pitch bend, and change of expression is added, and the singing voice information of the singing style of "enka” is changed.
  • the change of the singing voice information according to the singing style is realized by the singing pattern changing unit 12 of FIG. 1 using the musical score information 4 and the singing pattern data 13.
  • the singing voice information 6 ⁇ (before the singing style is applied) from the lyric providing unit 5 is sent to the singing pattern changing unit 12, and the singing pattern changing unit 12 outputs the singing voice information 6A before application.
  • the sounds (notes) that match the conditions of the singing pattern data (singing style data) in Fig. 5 above parameters are changed according to the singing pattern, and the singing style applied singing voice information 6B is output. It may be configured to send to the singing voice generation unit 7.
  • the other configuration is the same as that of FIG. 1 described above, and is not shown and will not be described.
  • the singing style can be instructed by the operator in advance as described above.
  • the singing style is stored in MIDI data, and is defined by a general song name, track name,
  • the singing pattern changing unit 12 can also make a determination based on the attached information such as a marker. For example, the name of the song or track is annotated with the style name itself, including the style name, or the style of the song or track name can be estimated. There is a case that has been done.
  • Styles can be applied to In this method, for example, the performance pattern of musical sounds such as saxophone and violin is changed in accordance with the specified performance style. Specifically, for a desired musical tone (musical sound of saxophone, violin, etc.) in the musical score information, for example, the performance pattern data shown in FIG.
  • the playing pattern data has a condition part and an execution part, as in FIG. 5 described above, and singing styles such as “popular”, “classic” and “enka” and expression changes are given to the items of the condition part.
  • the pitch, length, strength, tempo of the song, etc. which are the conditions for selecting the target sound (note), are included in the execution unit, and the sound (note) that matches the condition of the condition unit is included in the execution unit.
  • parameters such as vibrato, expression, timing, pitch bend (pitch start and phrase end bend), pitch adjustment, etc., as parameters of the expression change to be applied, .
  • note string information is sent to the performance pattern changing unit 15 and, from the performance pattern data 16 as described above, for a sound (note) that satisfies a predetermined condition according to the specified performance style.
  • a sound note
  • the performance data 14 to which the performance style is applied is sent to the MIDI sound source 9, and the MIDI sound source 9 generates a musical tone to which the performance style is applied based on the performance data.
  • FIG. 9 is a flowchart for explaining the overall operation of the singing voice synthesizing apparatus shown in FIG. 1 (or partially shown in FIG. 8).
  • performance data 1 of a MIDI file is input (step Sl).
  • the performance data 1 is analyzed, and the score data 4 is created (steps S2, S3).
  • the operator is inquired, and the operator's setting processing, such as selection of a performance style, selection of lyrics, selection of a track or channel to be lyrics, selection of a MIDI track to be muted, selection of a channel, etc. Do.
  • the force and the portion that the operator does not set can be selected based on the attached information such as the song name, track name, marker, etc. of the performance data 1, or the predetermined default information can be used in the subsequent processing. I have.
  • step S5 singing voice information 6 is created from the lyrics using the musical score information 4 of the channel in the track to which the lyrics are assigned.
  • Step S6 check whether all tracks have been processed. If not, proceed to the next track and go to Step S5. [0064] Therefore, when lyrics are added to a plurality of tracks, the lyrics are added independently of each other and singing voice information 6 is created.
  • step S7 it is determined whether or not a change in the singing style (or performance style) has been designated. If Yes (the style has been changed), the process proceeds to step S8, and No (no change) In the case of, the process proceeds to step S11.
  • step S8 it is determined whether or not the sound (note) of the musical score information satisfies the condition indicated in the condition section of the singing pattern data 13 (or the performance pattern data 16).
  • step S9 for the sound (note) that conforms to the above, the parameters for the expression change indicated in the execution section of the singing pattern data 13 (or the performance pattern data 16) are applied, and the singing voice data (or Performance data).
  • next step S10 it is determined whether or not the condition check has been completed for all notes (notes). If No, the process returns to step S8. If Yes, the process proceeds to the next step S11. .
  • step S 11 the singing voice generator 8 generates a singing voice waveform 8 from the singing voice information 6.
  • step S12 MIDI is reproduced by the MIDI sound source 9 to create an accompaniment waveform 10.
  • the singing voice waveform 8 and the accompaniment waveform 10 were obtained by the processing so far. Therefore, the singing voice waveform 8 and the accompaniment waveform 10 are synchronized by the mixing unit 11, and are superimposed and reproduced as the output waveform 3 (steps S13 and S14).
  • This output waveform 3 is output as an acoustic signal via a sound system (not shown).
  • an expression change including at least one of a volume change, a pitch change, and a timing change is given to a note of music information according to a singing style.
  • Pattern data in which parameters are set in advance the input performance data is analyzed as pitch, length, and lyrics music information, and based on the analyzed lyrics information of the music information. Lyrics are added to the note sequence to produce singing voice information, and the volume change, pitch change, and timing change based on the pattern data prepared in advance corresponding to the notes in the note sequence of the analyzed music information.
  • the singing pattern of the singing information is changed by giving an expression change including at least one, and a singing voice is generated based on the musical note sequence of the pattern-changed music information.
  • the singing style of the singing voice is An appropriate change in expression can be given, and the musical expression can be expanded. Also, while the conventional singing style was limited to the ability to sing only with poor expression, the singing style was arbitrarily selected to improve the expressiveness and to match the music. A singing style can achieve a natural singing voice, and a mismatched style can express humor, which can further enhance entertainment.
  • the performance style can be applied not only to the singing voice but also to the musical tone.
  • the performance data is analyzed as the musical information of the pitch, length, and lyrics, and analyzed.
  • Changing the singing or performance pattern by giving expression changes including at least one of volume change, pitch change, and timing change to the notes in the note sequence of the music information according to the singing or performance style It is preferable to generate a singing voice or a musical tone based on the note sequence of the music information whose pattern has been changed. This makes it possible to change the expression according to the style of the singing or performance to the singing voice at the time of singing or the musical tone at the time of performance, thereby significantly improving the musical expression.
  • the singing voice synthesis function described above is mounted on, for example, a robot device.
  • a bipedal walking type robot device shown as an example of a configuration is a practical robot that supports human activities in various situations in a living environment and other everyday life, and has an internal state (anger, sadness, It is an entertainment robot that can act according to joy, pleasure, etc., and can display basic actions performed by humans.
  • the robot device 60 includes a head unit 63 connected to a predetermined position of the trunk unit 62, two left and right arm units 64R / L, and two left and right legs.
  • the subunit 65R / L is concatenated (however, each of R and L is a suffix indicating each of right and left. The same applies hereinafter).
  • FIG. 11 schematically shows the configuration of the degrees of freedom of the joints provided in the robot apparatus 1.
  • the neck joint supporting the head unit 63 has three degrees of freedom: a neck joint axis 101, a neck pitch axis 102, and a neck joint roll axis 103.
  • each arm unit 64RZL constituting the upper limb includes a shoulder joint pitch axis 107, a shoulder joint roll axis 108, an upper arm joint axis 109, an elbow joint pitch axis 110, and a forearm joint axis 111.
  • Hand 11 4 is actually a multi-joint / multi-degree-of-freedom structure including a plurality of fingers.
  • the movement of the hand 114 has little contribution or influence to the posture control and the walking control of the robot device 60, and therefore, it is assumed herein that the degree of freedom is zero. Therefore, each arm has seven degrees of freedom.
  • the trunk unit 62 has three degrees of freedom: a trunk pitch axis 104, a trunk roll axis 105, and a trunk axis 106.
  • each leg unit 65RZL constituting the lower limb has a hip joint axis 115, a hip joint pitch axis 116, a hip joint roll axis 117, a knee joint pitch axis 118, an ankle joint pitch axis 119, An ankle joint roll shaft 120 and a foot 121 are provided.
  • the intersection of the hip joint pitch axis 116 and the hip joint roll axis 117 defines the hip joint position of the robot device 1.
  • the foot 121 of the human body is actually a structure including a multi-joint * multi-degree-of-freedom sole, but the sole of the robot device 60 has zero degrees of freedom. Therefore, each leg is configured with six degrees of freedom.
  • the robot 1 for entertainment is not necessarily limited to 32 degrees of freedom. It goes without saying that the degree of freedom, that is, the number of joints, can be appropriately increased or decreased according to the constraints on design and production and the required specifications.
  • Each degree of freedom of the robot device 60 as described above is actually implemented using an actuator.
  • Actuator units must be small and lightweight due to requirements such as eliminating extra bulges in appearance and approximating the human body shape, and controlling the posture of unstable structures such as bipedal walking. Is preferred. Further, it is more preferable that the actuator is constituted by a small AC servo actuator of a type directly connected to a gear and a type in which a servo control system is integrated into a motor unit and mounted in a motor unit.
  • FIG. 12 schematically shows a control system configuration of the robot device 60.
  • the control system includes a thought control module 200 that dynamically responds to a user input or the like to determine emotions and expresses emotions, and a movement that controls the whole body cooperative movement of the robot apparatus 1 such as driving an actuator 350.
  • the control module 300 is included.
  • control module 200 executes arithmetic processing relating to emotion determination and emotion expression.
  • U Central Processing Unit
  • RAM Random Access Memory
  • ROM Read Only Memory
  • external storage device hard disk drive, etc.
  • the thinking control module 200 receives the current emotion of the robot device 60 in accordance with an external stimulus such as image data input from the image input device 251 or voice data input from the voice input device 252. And make decisions.
  • the image input device 251 includes, for example, a plurality of charge coupled device (CCD) cameras
  • the audio input device 252 includes, for example, a plurality of microphones.
  • CCD charge coupled device
  • thinking control module 200 issues a command to movement control module 300 to execute a motion or action sequence based on a decision, that is, a movement of a limb.
  • One motion control module 300 includes a CPU 311 for controlling the whole body cooperative motion of the robot device 60, a RAM 312, a ROM 313, and an external storage device (such as a hard disk drive) 314. It is an independently driven information processing device that can perform self-contained processing.
  • the external storage device 314 for example, a walking pattern calculated offline, a target ZMP trajectory, and other action plans can be stored.
  • the ZMP is a point on the floor at which the moment due to the floor reaction force during walking becomes zero
  • the ZMP trajectory is, for example, a trajectory where the ZMP moves during the walking operation of the robot device 1.
  • the motion control module 300 includes an actuator 350 for realizing the degree of freedom of each joint distributed over the whole body of the robot device 60 shown in Fig. 11, and a posture sensor 351 for measuring the posture and inclination of the trunk unit 2.
  • Various devices such as a grounding confirmation sensor 3 52, 353 that detects leaving or landing on the left and right soles and a power supply control device 354 that manages the power supply such as a battery are connected via the bus interface (I / F) 301. It is connected.
  • the attitude sensor 351 is For example, it is constituted by a combination of an acceleration sensor and a gyro 'sensor, and the ground confirmation sensors 352 and 353 are constituted by a proximity sensor or a micro' switch.
  • the thinking control module 200 and the motion control module 300 are constructed on a common platform, and are interconnected via bus interfaces 201 and 301.
  • the exercise control module 300 controls the whole-body cooperative exercise by each actuator 350 that embodies the action specified by the thought control module 200. That is, the CPU 311 extracts an operation pattern corresponding to the action instructed from the thought control module 200 from the external storage device 314, or internally generates an operation pattern. Then, the CPU 311 sets a foot motion, a ZMP trajectory, a trunk motion, an upper limb motion, a waist horizontal position and a height, etc. in accordance with the specified motion pattern, and issues a command for instructing the motion in accordance with these settings. Transfer the value to each actuator 350.
  • the CPU 311 detects the posture and inclination of the trunk unit 62 of the robot device 60 based on the output signal of the posture sensor 351, and also detects each leg unit based on the output signals of the grounding confirmation sensors 352 and 353. By detecting whether the 65R / L is in the free leg state or in the standing state, the whole body cooperative movement of the robot device 60 can be adaptively controlled.
  • the CPU 311 controls the posture and operation of the robot device 60 so that the ZMP position always faces the center of the ZMP stable region.
  • the motion control module 300 returns to the thought control module 200 the force to which the action according to the intention determined in the thought control module 200 is expressed, that is, the processing state. RU
  • the robot device 60 can determine its own and surrounding conditions based on the control program, and can act autonomously.
  • a program (including data) implementing the above-described singing voice synthesizing function is stored in, for example, the ROM 213 of the thinking control module 200.
  • the execution of the singing voice synthesis program is performed by the CPU 211 of the thinking control module 200.
  • the singing voice generating unit 7 corresponding to the singing voice synthesizing unit and the waveform generating unit used in the voice synthesizing method and apparatus described in the specification and drawings of Japanese Patent Application No. 2002-73385 previously proposed by the present applicant.
  • singing voice information that can be used is illustrated, various other singing voice generating units can be used.
  • singing voice information including information required for singing voice generation by various singing voice generating units is used.
  • the performance data may be generated from the performance data.
  • the performance data is not limited to MIDI data, and performance data of various standards can be used.

Abstract

Inputted MIDI file performance data is analyzed as music information including pitches, lengths, and words (S2, S3). When the singing style is changed, the singing voice data is altered so that an expression change is given to the musical notes that match the condition. (S7, S8, S9). A singing voice is generated based on the singing voice information whose singing voice pattern is altered (S11). This makes it possible to synthesize a singing voice using performance data such as MIDI data and to alter the singing pattern according to the singing style.

Description

明 細 書  Specification
信号合成方法及び装置、歌声合成方法及び装置、プログラム及び記録 媒体並びにロボット装置  Signal synthesizing method and apparatus, singing voice synthesizing method and apparatus, program and recording medium, and robot apparatus
技術分野  Technical field
[0001] 本発明は、演奏データから歌声や楽音等の信号を合成する信号合成方法及び装置 The present invention relates to a signal synthesizing method and apparatus for synthesizing a signal such as a singing voice or a musical tone from performance data.
、歌声合成方法及び装置、プログラム及び記録媒体、並びにロボット装置に関する。 The present invention relates to a singing voice synthesizing method and apparatus, a program and a recording medium, and a robot apparatus.
[0002] 本出願は、 日本国において 2003年 6月 13日に出願された日本特許出願番号 2003 -170000を基礎として優先権を主張するものであり、この出願は参照することにより 、本出願に援用される。 [0002] This application claims priority based on Japanese Patent Application No. 2003-170000, filed in Japan on June 13, 2003, and this application is incorporated herein by reference. Incorporated.
背景技術  Background art
[0003] コンピュータ等により、与えられた歌唱データから歌声を生成する技術は特許第 323 3036号公報に代表されるように既に知られている。  [0003] A technique for generating a singing voice from given singing data by a computer or the like is already known as represented by Japanese Patent No. 323 3036.
[0004] MIDI (musical instrument digital interface)データは代表的な演奏データであり、事 実上の業界標準である。代表的には、 MIDIデータは MIDI音源と呼ばれるデジタノレ 音源 (コンピュータ音源や電子楽器音源等の MIDIデータにより動作する音源)を制 御して楽音を生成するのに使用される。 MIDIファイル(例えば、 SMF (standard MIDI file) )には歌詞データを入れることができ、歌詞付きの楽譜の自動作成に利用 される。  [0004] MIDI (musical instrument digital interface) data is representative performance data and is a practical industry standard. Typically, MIDI data is used to generate a musical tone by controlling a digital sound source called a MIDI sound source (a sound source operated by MIDI data, such as a computer sound source or an electronic musical instrument sound source). A MIDI file (eg, SMF (standard MIDI file)) can contain lyric data and is used to automatically create a score with lyrics.
[0005] また、 MIDIデータを歌声又は歌声を構成する音素セグメントのパラメータ表現 (特殊 データ表現)として利用する試みも特開平 11-95798号公報に代表されるように提 案されている。  [0005] Further, an attempt to use MIDI data as a parameter expression (special data expression) of a singing voice or a phoneme segment constituting the singing voice has been proposed as represented by Japanese Patent Application Laid-Open No. 11-95798.
[0006] し力し、これらの従来の技術においては MIDIデータのデータ形式の中で歌声を表 現しようとしている力 あくまでも楽器をコントロールする感覚でのコントロールであり、 MIDI本来が持っている歌詞データを利用するものではな力 た。  [0006] However, in these conventional technologies, the power to express a singing voice in the data format of MIDI data is a control just as if controlling a musical instrument. The power that does not utilize the power.
[0007] また、ほかの楽器用に作成された MIDIデータを、修正をカ卩えることなく歌声にするこ とはできなかった。 [0008] また、電子メールやホームページを読み上げる音声合成ソフトはソニー(株)の「[0007] Furthermore, MIDI data created for other musical instruments cannot be converted into a singing voice without correction. [0008] In addition, voice synthesis software that reads e-mails and websites is available from Sony Corporation.
Simple Speech」をはじめ多くのメーカーから発売されている力 読み上げ方は普通 の文章を読み上げるのと同じような口調であった。 The power spoken by many manufacturers, including Simple Speech, was read in the same tone as ordinary text.
[0009] ところで、電気的又は磁気的な作用を用いて人間(生物)の動作に似た運動を行う機 械装置を「ロボット」という。我が国においてロボットが普及し始めたのは、 1960年代 末からであるが、その多くは、工場における生産作業の自動化'無人化等を目的とし たマニピュレータや搬送ロボット等の産業用ロボット(Industrial Robot)であった。  By the way, a mechanical device that performs a motion similar to the motion of a human (living organism) using an electric or magnetic action is called a “robot”. Robots began to spread in Japan in the late 1960s, but most of them were industrial robots such as manipulators and transfer robots for the purpose of automation of production work in factories and unmanned operations. Met.
[0010] 最近では、人間のパートナーとして生活を支援する、すなわち住環境その他の日常 生活上の様々な場面における人的活動を支援する実用ロボットの開発が進められて いる。このような実用ロボットは、産業用ロボットとは異なり、人間の生活環境の様々な 局面において、個々に個性の相違した人間、又は様々な環境への適応方法を自ら 学習する能力を備えている。例えば、犬、猫のように 4足歩行の動物の身体メカニズ ムやその動作を模した「ペット型」ロボット、或いは、 2足直立歩行を行う人間等の身体 メカニズムや動作をモデルにしてデザインされた「人間型」又は「人間形」ロボット( Humanoid Robot)等のロボット装置は、既に実用化されつつある。  [0010] Recently, the development of practical robots that support life as a human partner, that is, support human activities in various situations in the living environment and other everyday life has been promoted. Unlike industrial robots, such practical robots have the ability to learn by themselves different personalities or how to adapt to various environments in various aspects of the human living environment. For example, it is designed based on the body mechanism and movement of a four-legged animal such as a dog or cat, or a “pet-type” robot that simulates its movement, or the body mechanism or movement of a human who walks upright on two legs. Robotic devices such as "humanoid" or "humanoid" robots are already being put into practical use.
[0011] これらのロボット装置は、産業用ロボットと比較して、エンタテインメント性を重視した 様々な動作を行うことができるため、エンタテインメントロボットと呼称される場合もある 。また、そのようなロボット装置には、外部からの情報や内部の状態に応じて自律的 に動作するものがある。  [0011] Since these robot devices can perform various operations that emphasize entertainment properties as compared with industrial robots, they are sometimes referred to as entertainment robots. Some of such robot devices operate autonomously in response to external information or internal conditions.
[0012] この自律的に動作するロボット装置に用いら  [0012] The robot used in this autonomously operating robot
れる人工知能(AI : artificial intelligence)は、推論'判断等の知的な機能を人工的に 実現したものであり、さらに感情や本能等の機能をも人工的に実現することが試みら れている。このような人工知能の外部への表現手段としての視覚的な表現手段ゃ自 然言語の表現手段等のうちで、自然言語表現機能の一例として、音声を用いること が挙げられる。  Artificial intelligence (AI) artificially realizes intellectual functions such as inference and judgment, and also attempts to artificially realize functions such as emotions and instinct. I have. Among such visual expression means as a means for expressing artificial intelligence to the outside, natural language expression means, and the like, the use of speech is an example of a natural language expression function.
[0013] 以上のように従来の歌声合成は特殊な形式のデータを用いていたり、仮に MIDIデ ータを用いていてもその中に坦め込まれている歌詞データを有効に活用できなかつ たり、ほかの楽器用に作成された MIDIデータを鼻歌感覚で歌い上げたりすることは できなかった。 [0013] As described above, the conventional singing voice synthesis uses data of a special format, and even if MIDI data is used, the lyrics data embedded in the data cannot be effectively used. Singing MIDI data created for other instruments could not.
[0014] また、歌唱スタイル等も特に考慮されておらず、表現力も乏しいものとならざるを得な いのが現状であった。  [0014] Furthermore, the singing style and the like are not particularly taken into consideration, and the present situation is that the expressive power must be poor.
発明の開示  Disclosure of the invention
発明が解決しょうとする課題  Problems the invention is trying to solve
[0015] 本発明は、このような従来の実情に鑑みて提案されたものであり、例えば MIDIデー タのような演奏データを活用して歌声を合成することが可能であり、さらに歌声のみな らず楽音についてもスタイルを考慮した表現を可能とするような歌声や楽音等の信号 合成方法及び装置、歌声合成方法及び装置を提供することを目的とする。 The present invention has been proposed in view of such a conventional situation. For example, it is possible to synthesize a singing voice using performance data such as MIDI data. It is an object of the present invention to provide a method and an apparatus for synthesizing a singing voice or a musical sound, and a singing voice synthesizing method and an apparatus, which enable expression in consideration of a style of a musical sound.
[0016] さらに、本発明の目的は、このような歌声合成機能をコンピュータに実施させるプログ ラム及び記録媒体を提供することである。 [0016] It is still another object of the present invention to provide a program and a recording medium that allow a computer to perform such a singing voice synthesis function.
[0017] さらに、本発明の目的は、このような歌声合成機能を実現するロボット装置を提供す ることである。 [0017] It is a further object of the present invention to provide a robot apparatus that realizes such a singing voice synthesis function.
課題を解決するための手段  Means for solving the problem
[0018] 本発明に係る歌声や楽音等の信号合成方法及び装置は、上記目的を達成するため 、演奏データを音の高さ、長さ、歌詞の音楽情報として解析し、解析された音楽情報 の音符列の音符に対して、歌唱又は演奏のスタイルに応じて、音量変化、音程変化 、タイミング変化の少なくとも 1つを含む表現変化を付与することにより歌唱又は演奏 パターンを変更し、パターン変更された音楽情報の音符列に基づいて歌声又は楽音 を生成することを特徴とする。  [0018] In order to achieve the above object, the method and apparatus for synthesizing a singing voice or a musical tone according to the present invention analyze performance data as musical information of pitch, length and lyrics, and analyze the analyzed musical information. The singing or performance pattern is changed by giving an expression change including at least one of a volume change, a pitch change, and a timing change to the notes in the note sequence according to the singing or performance style. And generating a singing voice or a musical tone based on the musical note sequence of the music information.
[0019] また、本発明に係る歌声合成方法及び装置は、上記目的を達成するため、歌唱スタ ィルに応じて、音楽情報の音符に対して、音量変化、音程変化、タイミング変化の少 なくとも 1つを含む表現変化を付与するためのパラメータが設定されたパターンデー タを予め用意しておき、入力された演奏データを音の高さ、長さ、歌詞の音楽情報と して解析し、解析された音楽情報の歌詞情報に基づき音符列に対して歌詞を付与し て歌声情報とし、上記解析された音楽情報の音符列の音符に対応して、予め用意さ れた上記パターンデータに基づいて音量変化、音程変化、タイミング変化の少なくと も 1つを含む表現変化を付与することにより上記歌声情報の歌唱パターンを変更し、 パターン変更された音楽情報の音符列に基づいて歌声を生成することを特徴とする [0019] Further, the singing voice synthesizing method and apparatus according to the present invention achieves the above object by reducing the volume change, the pitch change, and the timing change with respect to the music information note according to the singing style. In advance, pattern data in which parameters for giving expression changes including at least one are set are prepared in advance, and the input performance data is analyzed as the musical information of the pitch, length, and lyrics. Based on the lyric information of the analyzed music information, lyrics are added to the note sequence to make singing voice information. The singing voice information is added to the pattern data prepared in advance in correspondence with the notes of the note sequence of the analyzed music information. Changing the singing pattern of the singing voice information by giving an expression change including at least one of a volume change, a pitch change, and a timing change based on the singing voice information, A singing voice is generated based on a musical note sequence of the music information whose pattern has been changed.
[0020] この構成によれば、歌声を生成する際に、指定された歌唱スタイルに応じて、音量変 ィ匕、音程変化、タイミング変化の少なくとも 1つを含む表現変化を付与し、歌い方を変 更すること力 Sできる。 According to this configuration, when generating a singing voice, an expression change including at least one of a volume change, a pitch change, and a timing change is given according to a specified singing style, and the singing style is changed. The ability to change S
[0021] 上記演奏データは MIDIファイルの演奏データであることが好ましレ、。また、上記表 現変化を付与するためのパラメータは、上記歌唱スタイルと、上記音符の長さ、強さ、 強さの増減状態、高さ及び楽曲の速度の少なくとも 1つとに応じて設定されることが挙 げられる。上記表現変化は、対象となる音符の音に対してビブラート、ピッチベンド、 エクスプレッションの少なくとも 1つを付与することが挙げられる。上記ビブラートを付 与するためのパラメータは、振幅開始の遅れの情報と、振幅の情報と、周期の情報と 、振幅の増減の情報と、周期の増減の情報との少なくとも 1つを含み、上記エタスプレ ッシヨンを付与するためのパラメータは、音符の長さに対する比の時間情報とその時 間軸上での特徴的な任意の点における強さの情報の少なくとも 1つを含むことが挙げ られる。上記歌唱スタイルは、ユーザ設定、演奏データのトラック名、楽曲名、マーカ のレ、ずれかにより選択されることが挙げられる。  Preferably, the performance data is performance data of a MIDI file. Further, the parameter for giving the expression change is set according to the singing style and at least one of the note length, strength, strength increase / decrease state, height, and music speed. It is mentioned. The above-mentioned expression change includes adding at least one of vibrato, pitch bend, and expression to the sound of the target note. The parameter for giving the vibrato includes at least one of information on delay of amplitude start, information on amplitude, information on cycle, information on increase / decrease in amplitude, and information on increase / decrease in cycle. The parameter for assigning the ethasplayion may include at least one of time information of a ratio to a note length and information of strength at a characteristic arbitrary point on the time axis. The singing style may be selected based on the user setting, the track name of the performance data, the song name, the marker, or the deviation.
[0022] また、本発明に係るプログラムは、本発明の歌声合成機能をコンピュータに実行させ るものであり、本発明に係る記録媒体は、このプログラムが記録されたコンピュータ読 み取り可能なものである。  Further, the program according to the present invention causes a computer to execute the singing voice synthesizing function of the present invention, and the recording medium according to the present invention is a computer-readable medium storing the program. is there.
[0023] さらに、本発明に係るロボット装置は、上記目的を達成するため、供給された入力情 報に基づいて動作を行う自律型のロボット装置であって、歌唱スタイルに応じて、音 楽情報の音符に対して、音量変化、音程変化、タイミング変化の少なくとも 1つを含む 表現変化を付与するためのパラメータが設定されたパターンデータが蓄積された記 憶手段と、演奏データを音の高さ、長さ、歌詞の音楽情報として解析する解析手段と 、解析された音楽情報の歌詞情報に基づき音符列に対して歌詞を付与して歌声情 報とする歌詞付与手段と、上記解析手段により解析された音楽情報の音符列の音符 に対応して、上記記憶手段により読み出された音量変化、音程変化、タイミング変化 の少なくとも 1つを含む表現変化を付与することにより上記歌声情報の歌唱パターン を変更するパターン変更手段と、パターン変更された音楽情報の音符列に基づいて 歌声を生成する歌声生成手段とを有することを特徴とする。これにより、ロボットの持 つているエンタテインメント性を格段に向上させることができる。 Further, in order to achieve the above object, the robot apparatus according to the present invention is an autonomous robot apparatus that operates based on supplied input information, and includes music information in accordance with a singing style. Storage means for storing pattern data in which parameters for giving an expression change including at least one of a volume change, a pitch change, and a timing change are stored, and the performance data is stored at a pitch. Analyzing means for analyzing the music information of the music information, length, and lyrics, lyric providing means for providing lyric information to the note sequence based on the lyric information of the analyzed music information, and analyzing by the analyzing means. The singing is performed by giving an expression change including at least one of a volume change, a pitch change, and a timing change read out by the storage means in accordance with the musical note of the musical note sequence of the music information. Singing pattern of voice information And singing voice generating means for generating a singing voice based on the musical note sequence of the music information whose pattern has been changed. As a result, the entertainment properties of the robot can be significantly improved.
[0024] 本発明の更に他の目的、本発明によって得られる具体的な利点は、以下において図 面を参照して説明される実施の形態の説明から一層明らかにされるであろう。 [0024] Still other objects of the present invention and specific advantages obtained by the present invention will become more apparent from the description of the embodiments described below with reference to the drawings.
発明の効果  The invention's effect
[0025] 以上のような本発明によれば、歌声や楽音等の信号を合成するための本発明に係る 信号合成方法及び装置によれば、演奏データを音の高さ、長さ、歌詞の音楽情報と して解析し、解析された音楽情報の音符列の音符に対して、歌唱又は演奏のスタイ ルに応じて、音量変化、音程変化、タイミング変化の少なくとも 1つを含む表現変化を 付与することにより歌唱又は演奏パターンを変更し、パターン変更された音楽情報の 音符列に基づいて歌声又は楽音を生成することにより、歌唱の際の歌声や演奏の際 の楽音に対して、歌唱又は演奏のスタイルに応じた表現変化を付与することができ、 音楽表現が格段に向上する。  [0025] According to the present invention as described above, according to the signal synthesizing method and apparatus of the present invention for synthesizing signals such as singing voices and musical tones, the performance data is converted to the pitch, length, and lyrics. It is analyzed as music information, and an expression change including at least one of a volume change, a pitch change, and a timing change is given to the notes in the note sequence of the analyzed music information according to a singing or performance style. Singing or playing pattern by changing the singing or performance pattern and generating a singing voice or musical tone based on the musical note sequence of the pattern-changed music information. Expression change according to the style of music can be given, and the music expression can be greatly improved.
[0026] また、本発明に係る歌声合成方法及び装置によれば、歌唱スタイルに応じて、音楽 情報の音符に対して、音量変化、音程変化、タイミング変化の少なくとも 1つを含む表 現変化を付与するためのパラメータが設定されたパターンデータを予め用意しておき 、入力された演奏データを音の高さ、長さ、歌詞の音楽情報として解析し、解析され た音楽情報の歌詞情報に基づき音符列に対して歌詞を付与して歌声情報とし、上記 解析された音楽情報の音符列の音符に対応して、予め用意された上記パターンデ ータに基づいて音量変化、音程変化、タイミング変化の少なくとも 1つを含む表現変 化を付与することにより上記歌声情報の歌唱パターンを変更し、パターン変更された 音楽情報の音符列に基づいて歌声を生成することにより、歌唱の際の歌声に対して 、歌唱スタイルに応じた表現変化を付与することができ、音楽表現が格段に向上する 。したがって、従来において、固定された歌唱スタイルで表現力の乏しい歌い方しか できなかったのに対して、歌唱スタイルを任意に選択することにより、表現力が向上し 、また、楽曲に合わせた歌唱スタイルにより自然な歌声を実現したり、ミスマッチなスタ ィルにより滑稽さを表現することもでき、エンタテインメント性をさらに向上させることが できる。 [0026] Further, according to the singing voice synthesizing method and apparatus according to the present invention, an expression change including at least one of a volume change, a pitch change, and a timing change is given to a note of music information according to a singing style. Pattern data in which parameters for setting are provided in advance is prepared, and the input performance data is analyzed as music information of pitch, length, and lyrics, and based on the lyrics information of the analyzed music information. Lyrics are added to the note sequence to produce singing voice information, and the volume change, pitch change, and timing change are performed based on the previously prepared pattern data corresponding to the notes in the note sequence of the analyzed music information. By changing the singing pattern of the singing voice information by giving an expression change including at least one of the following, and generating a singing voice based on the note sequence of the pattern-changed music information, A change in expression according to the singing style can be given to the singing voice at the time of the singing, and the musical expression is significantly improved. Therefore, while the conventional singing style was limited to singing with poor expressive power, arbitrarily selecting a singing style improved the expressive power, and also achieved a singing style adapted to the music. Can realize a more natural singing voice, and can express humor with a mismatched style, further improving entertainment. it can.
[0027] また、本発明に係るプログラムは、本発明の歌声合成機能をコンピュータに実行させ るものであり、本発明に係る記録媒体は、このプログラムが記録されたコンピュータ読 み取り可能なものである。  Further, the program according to the present invention causes a computer to execute the singing voice synthesizing function of the present invention, and the recording medium according to the present invention is a computer-readable recording medium on which the program is recorded. is there.
[0028] また、本発明に係るロボット装置は本発明の歌声合成機能を実現する。すなわち、本 発明のロボット装置によれば、歌唱スタイルに応じて、音楽情報の音符に対して、音 量変化、音程変化、タイミング変化の少なくとも 1つを含む表現変化を付与するため のパラメータが設定されたパターンデータを予め用意しておき、入力された演奏デー タを音の高さ、長さ、歌詞の音楽情報として解析し、解析された音楽情報の歌詞情報 に基づき音符列に対して歌詞を付与して歌声情報とし、上記解析された音楽情報の 音符列の音符に対応して、予め用意された上記パターンデータに基づいて音量変 ィ匕、音程変化、タイミング変化の少なくとも 1つを含む表現変化を付与することにより 上記歌声情報の歌唱パターンを変更し、パターン変更された音楽情報の音符列に 基づいて歌声を生成することにより、歌唱の際の歌声に対して、歌唱スタイルに応じ た表現変化を付与することができ、音楽表現の拡大が図れ、また、楽曲に合わせた 歌唱スタイルにより自然な歌声を実現したり、ミスマッチなスタイルにより滑稽さを表現 することもでき、エンタテインメント性をさらに向上させることができる。したがって、ロボ ット装置の表現能力が向上し、エンタテインメント性を高めることができると共に、人間 との親密性を深めることができる。  [0028] The robot apparatus according to the present invention implements the singing voice synthesizing function of the present invention. That is, according to the robot apparatus of the present invention, a parameter for giving an expression change including at least one of a volume change, a pitch change, and a timing change to a note of music information is set according to a singing style. Prepared pattern data is prepared in advance, and the input performance data is analyzed as musical information of the pitch, length, and lyrics. Based on the analyzed lyrics information of the musical information, the lyrics for the note sequence are analyzed. Is added to the singing voice information, and includes at least one of a volume change, a pitch change, and a timing change based on the pattern data prepared in advance, corresponding to the notes in the note sequence of the analyzed music information. The singing voice at the time of singing is changed by changing the singing pattern of the singing information by giving a change in expression, and generating the singing voice based on the note sequence of the music information whose pattern has been changed. Changes in the expression according to the singing style can be applied to expand the musical expression, achieve a natural singing voice with a singing style that matches the music, and express humor with a mismatched style It is possible to further improve the entertainment. Therefore, the expression ability of the robot device is improved, and the entertainment property can be improved, and the intimacy with humans can be deepened.
図面の簡単な説明  BRIEF DESCRIPTION OF THE FIGURES
[0029] [図 1]図 1は、本実施の形態における歌声合成装置のシステム構成を説明するブロッ ク図である。  FIG. 1 is a block diagram illustrating a system configuration of a singing voice synthesizing apparatus according to the present embodiment.
[図 2]図 2は、解析結果の楽譜情報の例を示す図である。  FIG. 2 is a diagram showing an example of score information as an analysis result.
[図 3]図 3は、歌声情報の例を示す図である。  FIG. 3 is a diagram showing an example of singing voice information.
[図 4]図 4は、歌声生成部の構成例を説明するブロック図である。  FIG. 4 is a block diagram illustrating a configuration example of a singing voice generation unit.
[図 5]図 5は、歌唱パターンデータの例を示す図である。  FIG. 5 is a diagram showing an example of singing pattern data.
[図 6]図 6は、歌唱スタイル適用前の歌声情報の例を示す図である。  FIG. 6 is a diagram showing an example of singing voice information before applying a singing style.
[図 7]図 7は、図 6の歌声情報に対して「演歌」の歌唱スタイルが適用された後の歌声 情報を示す図である。 [FIG. 7] FIG. 7 shows a singing voice after the singing style “Enka” is applied to the singing voice information of FIG. It is a figure showing information.
[図 8]図 8は、本実施の形態における歌声合成装置の他の構成例の要部を示すプロ ック図である。  FIG. 8 is a block diagram showing a main part of another configuration example of the singing voice synthesizing apparatus according to the present embodiment.
[図 9]図 9は、本実施の形態における歌声合成装置の動作を説明するためのフロー チャートである。  FIG. 9 is a flowchart illustrating the operation of the singing voice synthesizing apparatus according to the present embodiment.
[図 10]図 10は、本実施の形態におけるロボット装置の外観構成を示す斜視図である  FIG. 10 is a perspective view showing an external configuration of a robot device according to the present embodiment.
[図 11]図 11は、同ロボット装置の自由度構成モデルを模式的に示す図である。 FIG. 11 is a diagram schematically showing a degree of freedom configuration model of the robot device.
[図 12]図 12は、同ロボット装置のシステム構成を示すブロック図である。  FIG. 12 is a block diagram showing a system configuration of the robot device.
発明を実施するための最良の形態  BEST MODE FOR CARRYING OUT THE INVENTION
[0030] 以下、本発明を適用した具体的な実施の形態について、図面を参照しながら詳細に 説明する。 Hereinafter, specific embodiments of the present invention will be described in detail with reference to the drawings.
[0031] 本発明の実施の形態においては、主として歌声を合成する歌声合成装置であって、 さらに楽音を合成する機能も有するような楽音合成機能付きの歌声合成装置の例を 示しているが、歌声のみを合成する歌声合成装置や、楽音を合成する楽音合成装置 、或いは歌声や楽音等のオーディオ信号を合成する信号合成装置等にも本発明を 容易に適用できることは勿論である。  In the embodiment of the present invention, an example of a singing voice synthesizing apparatus that mainly synthesizes a singing voice and further has a tone synthesizing function that also has a function of synthesizing a musical tone is shown. Of course, the present invention can be easily applied to a singing voice synthesizing device for synthesizing only singing voices, a tone synthesizing device for synthesizing musical tones, or a signal synthesizing device for synthesizing audio signals such as singing voices and musical tones.
[0032] 図 1は、本実施の形態における楽音合成機能付きの歌声合成装置の概略システム 構成を示すブロック図である。この図 1に示す歌声合成装置は、少なくとも感情モデ ノレ、音声合成手段及び発音手段を有する例えばロボット装置に適用することを想定 しているが、これに限定されず、各種ロボット装置や、ロボット以外の各種コンピュータ AI (artificial intelligence)等への適用も可能であることは勿論である。  FIG. 1 is a block diagram showing a schematic system configuration of a singing voice synthesizing apparatus with a musical sound synthesizing function according to the present embodiment. The singing voice synthesizing device shown in FIG. 1 is assumed to be applied to, for example, a robot device having at least an emotion model, a voice synthesizing unit, and a sound generating unit, but is not limited thereto. Of course, it is possible to apply to various computer AI (artificial intelligence).
[0033] 図 1において、 MIDIデータに代表される演奏データ 1を解析する演奏データ解析部  [0033] In FIG. 1, a performance data analysis unit analyzes performance data 1 represented by MIDI data.
2は入力された演奏データ 1を解析し演奏データ内にあるトラックやチャンネルの音の 高さや長さ、強さを表す楽譜情報 4に変換する。  2 analyzes the input performance data 1 and converts it into musical score information 4 representing the pitch, length and intensity of the tracks and channels in the performance data.
[0034] 図 2に楽譜情報 4に変換された演奏データ(MIDIデータ)の例を示す。図 2において 、トラック毎、チャンネル毎にイベントが書かれている。イベントにはノートイベントとコ ントロールイベントが含まれる。ノートイベントは発生時亥 IJ (図中の時間の欄)、高さ、 長さ、強さ(velocity)の情報を持つ。したがって、ノートイベントのシーケンスにより音 符列又は音列が定義される。コントロールイベントは発生時刻、コントロールのタイプ データ(例えばビブラート (vibrato)、演奏ダイナミクス表現(expression) )及びコント口 を示すデータを持つ。例えば、ビブラートの場合、コントロールのコンテンツとして、音 の振れの大きさを指示する「深さ」、音の揺れの周期を指示する「幅」、音の揺れの開 始タイミング (発音タイミングからの遅れ時間)を指示する「遅れ」の項目を有する。特 定のトラック、チャンネルに対するコントロールイベントはそのコントロールタイプにつ いて新たなコントロールイベント(コントロールチェンジ)が発生しない限り、そのトラッ ク、チャンネルの音符列の楽音再生に適用される。さらに、 MIDIファイルの演奏デー タにはトラック単位で歌詞を記入することができる。図 2において、上方に示す「あるう ひ」はトラック 1に記入された歌詞の一部であり、下方に示す「あるうひ」はトラック 2に 記入された歌詞の一部である。すなわち図 2の例は、解析した音楽情報(楽譜情報) の中に歌詞が坦め込まれた例である。 FIG. 2 shows an example of performance data (MIDI data) converted into musical score information 4. In FIG. 2, events are written for each track and each channel. Events include note events and control events. When a note event occurs, the time of the event is IJ (time column in the figure), height, It has information on length and strength. Therefore, a note sequence or a sound sequence is defined by a sequence of note events. The control event has a time of occurrence, control type data (for example, vibrato, performance dynamics expression) and data indicating the control port. For example, in the case of vibrato, the control contents include `` depth '' indicating the magnitude of the sound swing, `` width '' indicating the cycle of the sound swing, and the start timing of the sound swing (delay from the sounding timing). Time). Control events for a specific track or channel are applied to the playback of the note sequence of that track or channel, unless a new control event (control change) occurs for that control type. In addition, lyrics can be entered for each track in the performance data of a MIDI file. In FIG. 2, “Uruhi” shown at the top is a part of the lyrics written on track 1, and “Uruhi” shown at the bottom is a part of the lyrics written on track 2. That is, the example of FIG. 2 is an example in which lyrics are embedded in the analyzed music information (music score information).
[0035] なお、図 2において、時間は「小節:拍:ティック数」で表され、長さは「ティック数」で表 され、強さは「0-127」の数値で表され、高さは 440Hzが「A4」で表される。また、ビ ブラートは、深さ、幅、遅れがそれぞれ「0-64-127」の数値で表される。  In FIG. 2, time is represented by “measures: beats: number of ticks”, length is represented by “number of ticks”, strength is represented by numerical values of “0-127”, and height is represented by 440Hz is represented by "A4". For the vibrato, the depth, width, and delay are each expressed as a number from "0-64-127".
[0036] 図 1に戻り、変換された楽譜情報 4は歌詞付与部 5に渡される。歌詞付与部 5では楽 譜情報 4をもとに音符に対応した音の長さ、高さ、強さ、表情などの情報とともにその 音に対する歌詞が付与された歌声情報 6の生成を行う。  Returning to FIG. 1, the converted score information 4 is passed to the lyrics providing unit 5. The lyric imparting unit 5 generates singing voice information 6 to which the lyrics for the sound are attached along with information such as the length, pitch, intensity, and expression of the sound corresponding to the note, based on the musical score information 4.
[0037] 図 3に歌声情報 6の例を示す。図 3において、「¥song¥」は歌詞情報の開始を示す タグである。タグ「¥PP, T10673075¥」は 10673075 x secの休みを示し、タグ「 ¥tdyna 110 649075 ¥」は先頭力、ら 10673075 μ secの全体の強さを示し、タグ 「¥fine— 100¥」は MIDIのファインチューンに相当する高さの微調整を示し、タグ「 ¥ vibrato NRPN_dep = 64¥ j , [¥vibrato NRPN_del = 50¥]、「¥vibrat o NRPN_rat = 64¥」はそれぞれ、ビブラートの深さ、遅れ、幅を示す。また、タグ 「¥dyna 100¥」は音毎の強弱を示し、タグ「¥G4, T288461 ¥あ」は G4の高さ で、長さが 288461 μ secの歌詞「あ」を示す。図 3の歌声情報は図 2に示す楽譜情報 (MIDIデータの解析結果)から得られたものである。 FIG. 3 shows an example of the singing voice information 6. In FIG. 3, “\ song \” is a tag indicating the start of lyrics information. The tag “¥ PP, T10673075 ¥” indicates a break of 10673075 x sec, the tag “¥ tdyna 110 649075 ¥” indicates the head force, the overall strength of 10673075 μsec, and the tag “¥ fine—100 ¥” The tags "\ vibrato NRPN_dep = 64 \ j" and "\ vibrato NRPN_del = 50 \" and "\ vibrat o NRPN_rat = 64 \" indicate the fine adjustment of the height equivalent to MIDI fine tune, respectively. , Delay, width. Also, the tag “¥ dyna 100 ¥” indicates the strength of each sound, and the tag “¥ G4, T288461 ¥ あ” indicates the height of G4 and the length of 288461 μsec lyrics “A”. The singing voice information in Fig. 3 is the score information shown in Fig. 2. (Analysis result of MIDI data).
[0038] 図 2と図 3の比較から分かるように、楽器制御用の演奏データ(例えば音符情報)が歌 声情報の生成において十分に活用されている。例えば、歌詞「あるうひ」の構成要素 「あ」について、「あ」以外の歌唱属性である「あ」の音の発生時刻、長さ、高さ、強さ等 について、楽譜情報(図 2)中のコントロール情報やノートイベント情報に含まれる発 生時刻、長さ、高さ、強さ等が直接的に利用され、次の歌詞要素「る」についても楽譜 情報中の同じトラック、チャンネルにおける次のノートイベント情報が直接的に利用さ れ、以下同様である。 As can be seen from a comparison between FIG. 2 and FIG. 3, performance data (for example, note information) for musical instrument control is sufficiently utilized in generating singing voice information. For example, for the component “a” of the lyrics “aruhi”, the score information (Fig. 2) for the occurrence time, length, height, strength, etc. of the sound of “a” that is a singing attribute other than “a” ), The time of occurrence, length, height, strength, etc. included in the control information and note event information are directly used, and the next lyric element `` ru '' is also used in the same track and channel in the score information. The next note event information is used directly, and so on.
[0039] 図 1に戻り、歌声情報 6は歌声生成部 7に渡され、歌声生成部 7においては歌声情報  Returning to FIG. 1, the singing voice information 6 is passed to the singing voice generating unit 7, and the singing voice information is
6をもとに歌声波形 8の生成を行う。ここで、歌声情報 6から歌声波形 8を生成する歌 声生成部 7は例えば図 4に示すように構成される。  A singing voice waveform 8 is generated based on 6. Here, the singing voice generator 7 that generates the singing voice waveform 8 from the singing voice information 6 is configured as shown in FIG. 4, for example.
[0040] 図 4において、歌声韻律生成部 7— 1は歌声情報 6を歌声韻律データに変換する。波 形生成部 7 - 2は歌声韻律データを歌声波形 8に変換する。 In FIG. 4, the singing voice prosody generation unit 7-1 converts the singing voice information 6 into singing voice prosody data. The waveform generator 7-2 converts the singing voice prosody data into a singing voice waveform 8.
[0041] 具体例として、「A4」の高さの歌詞要素「ら」を一定時間伸ばす場合について説明す る。ビブラートをかけない場合の歌声韻律データは、以下の表のように表される。 As a specific example, a case where the lyrics element “ra” having the height of “A4” is extended for a certain period of time will be described. The singing voice prosody data without vibrato is shown in the following table.
[0042] 表 1 [0042] Table 1
Figure imgf000012_0001
この表において、 [LABEL]は、各音韻の継続時間長を表したものである。すなわち、「 ra」という音韻(音素セグメント)は、 0サンプルから 1000サンプルまでの 1000サンプ ルの継続時間長であり、「ra」に続く最初の「aa」という音韻は、 1000サンプル力 39 600サンプルまでの 38600サンプルの継続時間長である。また、 [PITCH]は、ピッ チ周期を点ピッチで表したものである。すなわち、 0サンプル点におけるピッチ周期は 50サンプルである。ここでは「ら」の高さを変えないので全てのサンプルに渡り 50サン プルのピッチ周期が適用される。また、 [VOLUME]は、各サンプル点での相対的な音 量を表したものである。すなわち、デフォルト値を 100%としたときに、 0サンプル点で は 66%の音量であり、 39600サンプル点では 57%の音量である。以下同様にして、 40100サンプル点では 48%の音量等が続き 42600サンプル点では 3%の音量とな る。これにより「ら」の音声が時間の経過と共に減衰することが実現される。
Figure imgf000012_0001
In this table, [LABEL] indicates the duration of each phoneme. In other words, the phoneme “ra” (phoneme segment) has a duration of 1000 samples from 0 to 1000 samples, and the first phoneme “aa” following “ra” has a power of 1000 samples 39 600 samples Up to 38,600 samples in duration. [PITCH] is a pitch cycle represented by a point pitch. That is, the pitch period at the 0 sample point is 50 samples. In this case, the pitch of 50 samples is applied to all samples because the height of the “ra” is not changed. [VOLUME] indicates the relative volume at each sample point. That is, when the default value is 100%, the volume is 66% at the 0 sample point and 57% at the 39600 sample point. Similarly, at the 40100 sample point, 48% of the volume continues, and at the 42600 sample point, the volume becomes 3%. You. This realizes that the sound of “La” attenuates with the passage of time.
[0044] これに対して、ビブラートをかける場合には、例えば、以下に示すような歌声韻律デ ータが作成される。  On the other hand, when vibrato is applied, for example, singing voice prosody data as described below is created.
[0045] 表 2 [0045] Table 2
[UBEU [PITCH] [VOLUME] [UBEU [PITCH] [VOLUME]
0 η 50 n 66  0 η 50 n 66
i ■n wnwnw αα innn 57 i ■ n wnwnw αα innn 57
11000 2000 53 40100 AO  11000 2000 53 40 100 AO
21000 47 40600 39  21000 47 40 600 39
31000 6009 53 41 inn 30  31000 6009 53 41 inn 30
39600 8010 47 41600 21  39600 8010 47 41600 21
40100 10010 53 42100 12  40 100 100 10 53 42 100 12
40600 12011 47 42600 3  40 600 12011 47 42 600 3
41100 αα "1401 1 53  41100 αα "1401 1 53
41600 16022 47  41600 16022 47
42100 18022 53  42100 18022 53
42600 20031 47  42600 20031 47
43100 22031 53  43100 22031 53
24042 47  24042 47
26042 53  26042 53
28045 47  28045 47
30045 53  30045 53
32051 47  32051 47
34051 53  34051 53
36062 47  36062 47
38062 53  38062 53
40074 47  40074 47
42074 53  42074 53
43100 50  43100 50
の [PITCH]の欄に示すように、 0サンプル点と 1000サンプル点におけるピッ チ周期は 50サンプルで同じであり、この間は音声の高さに変化がないが、それ以降 は、 2000サンプル点で 53サンプルのピッチ周期、 4009サンプノレ点で 47サンプノレ のピッチ周期、 6009サンプル点で 53のピッチ周期というようにピッチ周期が約 4000 サンプノレの周期(幅)を以て上下(50± 3)に振れている。これにより音声の高さの揺 れであるビブラートが実現される。この [PITCH]の欄のデータは歌声情報 6における 対応歌声要素(例えば「ら」 )に関する情報、特にノートナンバー(例えば A4)とビブラ ートコントロールデータ(例えば、タグ「¥vibrato NRPN_dep = 64¥」、 [¥vibra to NRPN_del= 50¥]、「¥vibrato NRPN_rat = 64¥」)に基づいて生成さ れる。 As shown in the [PITCH] section of The pitch period is the same for 50 samples, and the pitch of the voice remains unchanged during this period, but thereafter, the pitch period for 53 samples at 2000 sample points, the pitch period for 47 samples at 4009 sample points, and the sample period at 6009 samples points The pitch cycle fluctuates up and down (50 ± 3) with a cycle (width) of about 4000 sample lengths, such as 53 pitch cycles. This implements vibrato, which is a fluctuation in the pitch of the voice. The data in this [PITCH] column is information on the corresponding singing voice element (for example, "ra") in singing voice information 6, especially note number (for example, A4) and vibrato control data (for example, tag "\ vibrato NRPN_dep = 64 \"). , [\ Vibra to NRPN_del = 50 \], and "\ vibrato NRPN_rat = 64 \").
[0047] 波形生成部 7— 2はこのような歌声音韻データに基づき内部の波形メモリ(図示せず) からサンプルを読み出して歌声波形 8を生成する。なお、歌声情報 6から歌声波形 8 を生成する歌声生成部 7については上記の例に限らず、任意の適当な公知の歌声 生成器を使用できる。  The waveform generator 7-2 reads out a sample from an internal waveform memory (not shown) based on such singing voice / phonological data and generates a singing voice waveform 8. The singing voice generator 7 that generates the singing voice waveform 8 from the singing voice information 6 is not limited to the above example, and any appropriate known singing voice generator can be used.
[0048] 図 1に戻り、演奏データ 1は MIDI音源 9に渡され、 MIDI音源 9は演奏データをもとに 楽音の生成を行う。この楽音は伴奏波形 10である。  Returning to FIG. 1, the performance data 1 is passed to the MIDI sound source 9, and the MIDI sound source 9 generates a musical tone based on the performance data. This musical tone has an accompaniment waveform 10.
[0049] 歌声波形 8と伴奏波形 10はともに同期を取りミキシングを行うミキシング部 11に渡さ れる。 [0049] The singing voice waveform 8 and the accompaniment waveform 10 are both passed to a mixing unit 11 that performs synchronization and mixing.
[0050] ミキシング部 11では、歌声波形 8と伴奏波形 10との同期を取りそれぞれを重ね合わ せて出力波形 3として再生を行うことにより、演奏データ 1をもとに伴奏を伴った歌声 による音楽再生を行う。  [0050] The mixing section 11 synchronizes the singing voice waveform 8 and the accompaniment waveform 10 and superimposes them on each other and reproduces them as the output waveform 3, so that music reproduction using the singing voice accompanied by the accompaniment based on the performance data 1. I do.
[0051] 楽譜情報 4をもとに歌詞付与部 5により歌声情報 6に変換する段階において、楽譜情 報 4において歌詞情報が存在する場合には、情報として存在する歌詞を優先して歌 声情報 6の付与を行う。上述したように図 2は歌詞が付与された状態の楽譜情報 4の 例であり、図 3は図 2の楽譜情報 4から生成された歌声情報 6の例である。  [0051] At the stage of converting the musical score information 4 into the singing voice information 6 by the lyrics providing unit 5, if the musical score information 4 includes the lyrics information, the singing voice information is prioritized by the lyrics present as the information. 6 is given. As described above, FIG. 2 shows an example of the musical score information 4 to which lyrics are added, and FIG. 3 shows an example of the singing voice information 6 generated from the musical score information 4 of FIG.
[0052] ここで、歌声を生成する際にオペレータにより歌唱スタイルの指定がある場合には、 楽譜情報 4から歌声情報 6に変換される際に、楽譜情報 4に記載されている音楽情 報は、歌唱パターン変更部 12に渡される。  Here, when the singing style is specified by the operator when generating the singing voice, when the musical score information 4 is converted into the singing voice information 6, the music information described in the musical score information 4 is Is passed to the singing pattern changing unit 12.
[0053] 歌唱パターン変更部 12においては、歌唱パターンデータ 13 (歌唱スタイルデータと もいう。)と楽譜情報 4とを照らし合わせて、指定された歌唱スタイルに適合する歌唱 パターンデータ 13を参照し、そこに記載されている条件に適合する楽譜情報 4の音( 音符)に対して、歌唱パターンデータ 13記載されている歌唱パターンのパラメータを 付与して歌声情報 6を生成する。具体的には、歌唱スタイルに応じて、楽譜情報の音 符列の所定の音(音符)に対して、ビブラート、エクスプレッション、タイミング、ピッチ ベンド等のような音量変化、音程変化、タイミング変化を含む表現変化を付与するた めのパラメータが設定されており、これらのパラメータが歌唱パターンデータ 13 (歌唱 スタイルデータ)として記憶手段に蓄積されており、歌唱パターン変更部 12は、楽譜 情報 4と歌唱パターンデータ 13とを用いて、歌唱スタイルに応じた変更が加えられた 歌声情報 6を生成する。 In the singing pattern changing unit 12, the singing pattern data 13 (singing style data and Also say. ) Is compared with the musical score information 4 to refer to the singing pattern data 13 matching the specified singing style, and singing is performed for the sound (note) of the musical score information 4 matching the conditions described therein. The singing voice information 6 is generated by adding the parameters of the singing pattern described in the pattern data 13. More specifically, for a predetermined note (note) in a musical note sequence in a musical score information, a volume change such as vibrato, expression, timing, pitch bend, a pitch change, and a timing change are included. The parameters for giving the expression change are set, and these parameters are stored in the storage means as singing pattern data 13 (singing style data). The singing pattern changing unit 12 stores the score information 4 and the singing pattern data. Using the data 13, the singing voice information 6 modified according to the singing style is generated.
[0054] 図 5は、各歌唱スタイルに応じた歌唱パターンデータ 13 (歌唱スタイルデータ)の具 体例を示す図である。この図 5の例において、歌唱パターンデータ 13は、条件部と実 行部の 2つに分かれており、条件部の項目には「ポピュラー」、「クラシック」、「演歌」 などの歌唱スタイルと、表現変化を付与する対象となる音 (音符)を選択する条件とな る音の高さ、長さ、強さ、強さの増減パターン、曲のテンポ等が含まれており、実行部 には、条件部に記載されている条件に適合した音(音符)に付与する表現変化のパ ラメータとしてのビブラート(vibrato) ,エクスプレッション(expression :音の強弱変化、 演奏ダイナミクス表現)、タイミング、ピッチベンド(フレーズ頭、フレーズ終わりのピッ チベンド)、ピッチ調整等が含まれている。  FIG. 5 is a diagram showing a specific example of singing pattern data 13 (singing style data) corresponding to each singing style. In the example of FIG. 5, the singing pattern data 13 is divided into two parts, a condition part and an execution part. The items of the condition part include singing styles such as “popular”, “classic”, and “enka”, and The pitch, length, strength, strength increase / decrease pattern, tempo of the music, etc., which are the conditions for selecting the sound (note) to be given the expression change, are included in the execution unit. , Vibrato as parameters of the expression change to be applied to the sound (note) conforming to the condition described in the condition section, expression (expression: dynamics of sound, performance dynamics expression), timing, pitch bend (phrase Head, pitch end of phrase), pitch adjustment, etc. are included.
[0055] 実行部のビブラートとしては、ビブラートがかかるまでの遅れ、周期、振幅、周期の増 減、振幅の増減のパラメータが指定される。エクスプレッションでは音の始まりから終 わりまでの時間を 100とした場合の先頭、終端、大きな変化点などの特徴点となる何 点かでの音量のパラメータが指定される。タイミングはビートに対しての遅れや進み 具合を示すのパラメータが指定される。ピッチベンドは、フレーズの頭や終端の音に 対してピッチのずり上げやずり下げ処理を行う場合のピッチの上げ下げの度合いを、 音の長さを 100として特徴となる時間での音程のずらし幅をセント数で表したパラメ一 タが指定される。フレーズ内の音に対しては適応されない。ピッチ調整はピッチ全体 を上げたり下げたりする際のセント数のパラメータが指定される。ここでセントとは 100 セントで半音を表す音程の幅の単位である。 As the vibrato of the execution unit, parameters for delay until the vibrato is applied, cycle, amplitude, increase / decrease of the cycle, and increase / decrease of the amplitude are specified. In the expression, the volume parameter is specified at some points that are characteristic points, such as the beginning, end, and a large change point when the time from the beginning to the end of the sound is 100. For the timing, a parameter indicating the degree of delay or advance relative to the beat is specified. Pitch bend is the degree to which the pitch is raised or lowered when the pitch is raised or lowered for the sound at the beginning or end of the phrase. A parameter expressed in cents is specified. It does not apply to sounds in phrases. For pitch adjustment, the parameter of the number of cents when raising or lowering the entire pitch is specified. Where cent is 100 It is a unit of pitch width that represents a semitone in cents.
[0056] この歌唱スタイルの適用例(歌唱パターンデータのパラメータの付与例)を、図 6、図 7 に示す。図 6は歌唱スタイル適用前の歌声情報を示し、この図 6の破線で囲った部分 ptAに対して、例えば「演歌」の歌唱スタイルの歌唱パターンデータの各パラメータが 適用された後の歌声情報を図 7の破線で囲った部分 ptBに示している。これらの図 6 、図 7において、例えば図 6の歌声情報の歌詞の「ひ」の音(音符)「E4, T144231」 に対しては、図 7に示すように、タイミングの補正や、フレーズ頭ピッチベンド、フレー ズ終わりピッチベンド、エクスプレッションの変更等のパラメータによる表現変化が付 与され、「演歌」の歌唱スタイルの歌声情報に変更される。  FIGS. 6 and 7 show examples of application of this singing style (examples of giving singing pattern data parameters). Fig. 6 shows the singing voice information before the application of the singing style.For the part ptA surrounded by the broken line in Fig. 6, for example, the singing voice information after each parameter of the singing pattern data of the singing style of "enka" is applied. This is indicated by ptB enclosed by the broken line in FIG. In FIGS. 6 and 7, for example, as shown in FIG. 7, for the lyric “hi” (note) “E4, T144231” of the lyrics of the singing voice information in FIG. Expression change by parameters such as pitch bend, end-of-phrase pitch bend, and change of expression is added, and the singing voice information of the singing style of "enka" is changed.
[0057] このような歌唱スタイルに応じた歌声情報の変更は、図 1の歌唱パターン変更部 12 にて、楽譜情報 4と歌唱パターンデータ 13とを用いて実現されるものである力 他の 例として、図 8に示すように、歌詞付与部 5からの(歌唱スタイル適用前の)歌声情報 6 Αを歌唱パターン変更部 12に送り、歌唱パターン変更部 12では適用前の歌声情報 6A中の各音(音符)の内、上記図 5の歌唱パターンデータ(歌唱スタイルデータ)の 条件に合致するものに対して歌唱パターンに応じたパラメータ変更を加え、歌唱スタ ィル適用された歌声情報 6Bを出力して歌声生成部 7に送るように構成してもよい。他 の構成は上記図 1と同様であるため、図示せず説明を省略する。  The change of the singing voice information according to the singing style is realized by the singing pattern changing unit 12 of FIG. 1 using the musical score information 4 and the singing pattern data 13. As shown in FIG. 8, the singing voice information 6Α (before the singing style is applied) from the lyric providing unit 5 is sent to the singing pattern changing unit 12, and the singing pattern changing unit 12 outputs the singing voice information 6A before application. Of the sounds (notes) that match the conditions of the singing pattern data (singing style data) in Fig. 5 above, parameters are changed according to the singing pattern, and the singing style applied singing voice information 6B is output. It may be configured to send to the singing voice generation unit 7. The other configuration is the same as that of FIG. 1 described above, and is not shown and will not be described.
[0058] 上記歌唱スタイルは、上述したようにオペレータにより予め指示することも可能である 、 MIDIデータに格納されており SMF (Standard MIDI File)によって規定された一 般的な楽曲名、トラック名、マーカ等の付属情報により歌唱パターン変更部 12にお いて判断することも可能である。例えば、楽曲名やトラック名にスタイル名自体ゃスタ ィル名を含む注釈等が記載されてレ、たり、楽曲名やトラック名力 スタイルを推定でき たり、マーカ等の付属情報にスタイル名が記載されているような場合が挙げられる。  [0058] The singing style can be instructed by the operator in advance as described above. The singing style is stored in MIDI data, and is defined by a general song name, track name, The singing pattern changing unit 12 can also make a determination based on the attached information such as a marker. For example, the name of the song or track is annotated with the style name itself, including the style name, or the style of the song or track name can be estimated. There is a case that has been done.
[0059] ここで、上述の例では、歌声の場合を主として説明したが、楽音の場合にも同様  Here, in the above example, the case of singing voice has been mainly described, but the same applies to the case of musical sound.
にスタイル (演奏スタイル)を適用することができる。これは、指定された演奏スタイノレ に応じて、例えばサキソフォン、バイオリン等の楽音の演奏パターンに変更をカ卩えるも のである。具体的には、楽譜情報の内の所望の楽音(サキソフォン、バイオリン等の 楽音)について、例えば図 5と同様な演奏パターンデータを用意すればよぐこの演 奏パターンデータは、上記図 5と同様に、条件部と実行部とを有し、条件部の項目に は「ポピュラー」、「クラシック」、「演歌」などの歌唱スタイルと、表現変化を付与する対 象となる音 (音符)を選択する条件となる音の高さ、長さ、強さ、曲のテンポ等が含ま れ、実行部には、条件部の条件に適合した音 (音符)に付与する表現変化のパラメ一 タとしてのビブラート(vibrato)、エクスプレッション(expression)、タイミング、ピッチべ ンド(フレーズ頭、フレーズ終わりのピッチベンド)、ピッチ調整等が含まれるようにす れは'よレ、。 Styles (playing styles) can be applied to In this method, for example, the performance pattern of musical sounds such as saxophone and violin is changed in accordance with the specified performance style. Specifically, for a desired musical tone (musical sound of saxophone, violin, etc.) in the musical score information, for example, the performance pattern data shown in FIG. The playing pattern data has a condition part and an execution part, as in FIG. 5 described above, and singing styles such as “popular”, “classic” and “enka” and expression changes are given to the items of the condition part. The pitch, length, strength, tempo of the song, etc., which are the conditions for selecting the target sound (note), are included in the execution unit, and the sound (note) that matches the condition of the condition unit is included in the execution unit. To include parameters such as vibrato, expression, timing, pitch bend (pitch start and phrase end bend), pitch adjustment, etc., as parameters of the expression change to be applied, .
[0060] 図 1の例では、楽譜情報 4の所望の楽音(サキソフォン、バイオリン等の楽音)の情報  In the example of FIG. 1, information of a desired musical sound (musical sound of saxophone, violin, etc.) of musical score information 4 is shown.
(例えば音符列の情報)を、演奏パターン変更部 15に送って、上述したような演奏パ ターンデータ 16から、指定された演奏スタイルに応じて、所定条件を満たす音(音符 )に対して、ビブラート(vibrato)、エクスプレッション(expression)、タイミング、ピッチ ベンド(フレーズ頭、フレーズ終わりのピッチベンド)、ピッチ調整等の表現変化のパラ メータを付与し、演奏スタイルが適用された演奏データ 14を得るようにしている。この 演奏スタイルが適用された演奏データ 14は MIDI音源 9に送られ、 MIDI音源 9は演 奏データをもとに演奏スタイルが適用された楽音の生成を行う。  (For example, note string information) is sent to the performance pattern changing unit 15 and, from the performance pattern data 16 as described above, for a sound (note) that satisfies a predetermined condition according to the specified performance style. By adding parameters such as vibrato, expression, timing, pitch bend (pitch bend at the beginning and end of phrase), pitch adjustment, etc., it is possible to obtain performance data 14 to which a performance style is applied. ing. The performance data 14 to which the performance style is applied is sent to the MIDI sound source 9, and the MIDI sound source 9 generates a musical tone to which the performance style is applied based on the performance data.
[0061] 次に図 9は、図 1に示す(或いは図 8に一部を示す)歌声合成装置の全体動作を説明 するためのフローチャートである。  Next, FIG. 9 is a flowchart for explaining the overall operation of the singing voice synthesizing apparatus shown in FIG. 1 (or partially shown in FIG. 8).
[0062] この図 9において、先ず MIDIファイルの演奏データ 1を入力する(ステップ Sl)。次 に演奏データ 1を解析し、楽譜データ 4を作成する (ステップ S2、 S3)。次に、必要に 応じてオペレータに問い合わせ、オペレータの設定処理、例えば、演奏スタイルの選 択、歌詞の選択、歌詞の対象であるトラック、チャンネルの選択、ミュートする MIDIト ラック、チャンネルの選択等を行う。なおオペレータが設定しな力、つた部分について は、演奏データ 1の楽曲名、トラック名、マーカ等の付属情報に基づいて選択したり、 予め定められたデフォルト情報を後続処理で使用するようにしている。  In FIG. 9, first, performance data 1 of a MIDI file is input (step Sl). Next, the performance data 1 is analyzed, and the score data 4 is created (steps S2, S3). Next, if necessary, the operator is inquired, and the operator's setting processing, such as selection of a performance style, selection of lyrics, selection of a track or channel to be lyrics, selection of a MIDI track to be muted, selection of a channel, etc. Do. In addition, the force and the portion that the operator does not set can be selected based on the attached information such as the song name, track name, marker, etc. of the performance data 1, or the predetermined default information can be used in the subsequent processing. I have.
[0063] 続くステップ S5では、歌詞を割り当てるトラックにおけるチャンネルの楽譜情報 4を用 いて歌詞から歌声情報 6を作成する。次に、全てのトラックについて処理を完了した 力、どうか検査し (ステップ S6)、完了してなければトラックを次に進めて、ステップ S5に [0064] したがって、複数のトラックにそれぞれ歌詞を付加する場合に、互いに独立して歌詞 が付加され歌声情報 6が作成されることになる。 In the following step S5, singing voice information 6 is created from the lyrics using the musical score information 4 of the channel in the track to which the lyrics are assigned. Next, check whether all tracks have been processed (Step S6) .If not, proceed to the next track and go to Step S5. [0064] Therefore, when lyrics are added to a plurality of tracks, the lyrics are added independently of each other and singing voice information 6 is created.
[0065] 次に、ステップ S7において、歌唱スタイル(或いは演奏スタイル)の変更の指定があ つたか否かを判別し、 Yes (スタイル変更あり)の場合はステップ S8に進み、 No (変更 無し)の場合はステップ S 11に進む。  Next, in step S7, it is determined whether or not a change in the singing style (or performance style) has been designated. If Yes (the style has been changed), the process proceeds to step S8, and No (no change) In the case of, the process proceeds to step S11.
[0066] ステップ S8では、楽譜情報の音(音符)について、上記歌唱パターンデータ 13 (或い は演奏パターンデータ 16)の条件部に示された条件に適合するか否かを判別し、条 件に適合する音(音符)に対しては、ステップ S9にて、上記歌唱パターンデータ 13 ( 或いは演奏パターンデータ 16)の実行部に示された表現変化のためのパラメータを 適用し、歌声データ(或いは演奏データ)を変更する。  In step S8, it is determined whether or not the sound (note) of the musical score information satisfies the condition indicated in the condition section of the singing pattern data 13 (or the performance pattern data 16). In step S9, for the sound (note) that conforms to the above, the parameters for the expression change indicated in the execution section of the singing pattern data 13 (or the performance pattern data 16) are applied, and the singing voice data (or Performance data).
[0067] 次のステップ S10では、全ての音(音符)に対して条件チェックが終了したか否かを判 別し、 Noのときはステップ S8に戻り、 Yesのときは次のステップ S11に進む。  In the next step S10, it is determined whether or not the condition check has been completed for all notes (notes). If No, the process returns to step S8. If Yes, the process proceeds to the next step S11. .
[0068] ステップ S11では、歌声生成部 7により歌声情報 6から歌声波形 8を作成する。次のス テツプ S12では、 MIDI音源 9により MIDIを再生して伴奏波形 10を作成する。  In step S 11, the singing voice generator 8 generates a singing voice waveform 8 from the singing voice information 6. In the next step S12, MIDI is reproduced by the MIDI sound source 9 to create an accompaniment waveform 10.
[0069] ここまでの処理で、歌声波形 8及び伴奏波形 10が得られた。そこで、ミキシング部 11 により、歌声波形 8と伴奏波形 10との同期を取りそれぞれを重ね合わせて出力波形 3 として再生を行う(ステップ S 13、 S14)。この出力波形 3は図示しないサウンドシステ ムを介して音響信号として出力される。  The singing voice waveform 8 and the accompaniment waveform 10 were obtained by the processing so far. Therefore, the singing voice waveform 8 and the accompaniment waveform 10 are synchronized by the mixing unit 11, and are superimposed and reproduced as the output waveform 3 (steps S13 and S14). This output waveform 3 is output as an acoustic signal via a sound system (not shown).
[0070] 以上説明した本発明の実施の形態をまとめると、歌唱スタイルに応じて、音楽情報の 音符に対して、音量変化、音程変化、タイミング変化の少なくとも 1つを含む表現変 化を付与するためのパラメータが設定されたパターンデータを予め用意しておき、入 力された演奏データを音の高さ、長さ、歌詞の音楽情報として解析し、解析された音 楽情報の歌詞情報に基づき音符列に対して歌詞を付与して歌声情報とし、上記解 析された音楽情報の音符列の音符に対応して、予め用意された上記パターンデータ に基づいて音量変化、音程変化、タイミング変化の少なくとも 1つを含む表現変化を 付与することにより上記歌声情報の歌唱パターンを変更し、パターン変更された音楽 情報の音符列に基づいて歌声を生成することが挙げられる。  In summary of the embodiment of the present invention described above, an expression change including at least one of a volume change, a pitch change, and a timing change is given to a note of music information according to a singing style. Pattern data in which parameters are set in advance, the input performance data is analyzed as pitch, length, and lyrics music information, and based on the analyzed lyrics information of the music information. Lyrics are added to the note sequence to produce singing voice information, and the volume change, pitch change, and timing change based on the pattern data prepared in advance corresponding to the notes in the note sequence of the analyzed music information. The singing pattern of the singing information is changed by giving an expression change including at least one, and a singing voice is generated based on the musical note sequence of the pattern-changed music information.
[0071] このような本発明の実施の形態によれば、歌唱の際の歌声に対して、歌唱スタイルに 応じた表現変化を付与することができ、音楽表現の拡大が図れる。また、従来におい て、固定された歌唱スタイルで表現力の乏しい歌い方しかできな力 たのに対して、 歌唱スタイルを任意に選択することにより、表現力が向上し、また、楽曲に合わせた 歌唱スタイルにより自然な歌声を実現したり、ミスマッチなスタイルにより滑稽さを表現 することもでき、エンタテインメント性をさらに向上させることができる。 According to such an embodiment of the present invention, the singing style of the singing voice is An appropriate change in expression can be given, and the musical expression can be expanded. Also, while the conventional singing style was limited to the ability to sing only with poor expression, the singing style was arbitrarily selected to improve the expressiveness and to match the music. A singing style can achieve a natural singing voice, and a mismatched style can express humor, which can further enhance entertainment.
[0072] また、歌声のみならず、楽音に対しても演奏スタイルを適用することができ、この場合 には、演奏データを音の高さ、長さ、歌詞の音楽情報として解析し、解析された音楽 情報の音符列の音符に対して、歌唱又は演奏のスタイルに応じて、音量変化、音程 変化、タイミング変化の少なくとも 1つを含む表現変化を付与することにより歌唱又は 演奏パターンを変更し、パターン変更された音楽情報の音符列に基づいて歌声又は 楽音を生成することが好ましい。これによつて、歌唱の際の歌声や演奏の際の楽音に 対して、歌唱又は演奏のスタイルに応じた表現変化を付与することができ、音楽表現 が格段に向上する。  [0072] Further, the performance style can be applied not only to the singing voice but also to the musical tone. In this case, the performance data is analyzed as the musical information of the pitch, length, and lyrics, and analyzed. Changing the singing or performance pattern by giving expression changes including at least one of volume change, pitch change, and timing change to the notes in the note sequence of the music information according to the singing or performance style, It is preferable to generate a singing voice or a musical tone based on the note sequence of the music information whose pattern has been changed. This makes it possible to change the expression according to the style of the singing or performance to the singing voice at the time of singing or the musical tone at the time of performance, thereby significantly improving the musical expression.
[0073] 以上説明した歌声合成機能は例えば、ロボット装置に搭載される。  The singing voice synthesis function described above is mounted on, for example, a robot device.
[0074] 以下、一構成例として示す 2足歩行タイプのロボット装置は、住環境その他の日常生 活上の様々な場面における人的活動を支援する実用ロボットであり、内部状態 (怒り 、悲しみ、喜び、楽しみ等)に応じて行動できるほか、人間が行う基本的な動作を表 出できるエンタテインメントロボットである。  Hereinafter, a bipedal walking type robot device shown as an example of a configuration is a practical robot that supports human activities in various situations in a living environment and other everyday life, and has an internal state (anger, sadness, It is an entertainment robot that can act according to joy, pleasure, etc., and can display basic actions performed by humans.
[0075] 図 10に示すように、ロボット装置 60は、体幹部ユニット 62の所定の位置に頭部ュニッ ト 63が連結されるとともに、左右 2つの腕部ユニット 64R/Lと、左右 2つの脚部ュニ ット 65R/Lが連結されて構成されている(ただし、 R及び Lの各々は、右及び左の各 々を示す接尾辞である。以下において同じ。)。  As shown in FIG. 10, the robot device 60 includes a head unit 63 connected to a predetermined position of the trunk unit 62, two left and right arm units 64R / L, and two left and right legs. The subunit 65R / L is concatenated (however, each of R and L is a suffix indicating each of right and left. The same applies hereinafter).
[0076] このロボット装置 1が具備する関節自由度構成を図 11に模式的に示す。頭部ユニット 63を支持する首関節は、首関節ョー軸 101と、首関節ピッチ軸 102と、首関節ロー ル軸 103という 3自由度を有している。  FIG. 11 schematically shows the configuration of the degrees of freedom of the joints provided in the robot apparatus 1. The neck joint supporting the head unit 63 has three degrees of freedom: a neck joint axis 101, a neck pitch axis 102, and a neck joint roll axis 103.
[0077] また、上肢を構成する各々の腕部ユニット 64RZLは、肩関節ピッチ軸 107と、肩関 節ロール軸 108と、上腕ョー軸 109と、肘関節ピッチ軸 110と、前腕ョー軸 111と、手 首関節ピッチ軸 112と、手首関節ロール軸 113と、手部 114とで構成される。手部 11 4は、実際には、複数本の指を含む多関節 ·多自由度構造体である。ただし、手部 11 4の動作は、ロボット装置 60の姿勢制御や歩行制御に対する寄与や影響が少なレ、の で、本明細書ではゼロ自由度と仮定する。したがって、各腕部は 7自由度を有すると する。 [0077] Further, each arm unit 64RZL constituting the upper limb includes a shoulder joint pitch axis 107, a shoulder joint roll axis 108, an upper arm joint axis 109, an elbow joint pitch axis 110, and a forearm joint axis 111. A wrist joint pitch axis 112, a wrist joint roll axis 113, and a hand 114. Hand 11 4 is actually a multi-joint / multi-degree-of-freedom structure including a plurality of fingers. However, the movement of the hand 114 has little contribution or influence to the posture control and the walking control of the robot device 60, and therefore, it is assumed herein that the degree of freedom is zero. Therefore, each arm has seven degrees of freedom.
[0078] また、体幹部ユニット 62は、体幹ピッチ軸 104と、体幹ロール軸 105と、体幹ョー軸 1 06という 3自由度を有する。  The trunk unit 62 has three degrees of freedom: a trunk pitch axis 104, a trunk roll axis 105, and a trunk axis 106.
[0079] また、下肢を構成する各々の脚部ユニット 65RZLは、股関節ョー軸 115と、股関節 ピッチ軸 116と、股関節ロール軸 117と、膝関節ピッチ軸 118と、足首関節ピッチ軸 1 19と、足首関節ロール軸 120と、足部 121とで構成される。本明細書中では、股関節 ピッチ軸 116と股関節ロール軸 117の交点は、ロボット装置 1の股関節位置を定義す る。人体の足部 121は、実際には多関節 *多自由度の足底を含んだ構造体であるが 、ロボット装置 60の足底は、ゼロ自由度とする。したがって、各脚部は、 6自由度で構 成される。  Further, each leg unit 65RZL constituting the lower limb has a hip joint axis 115, a hip joint pitch axis 116, a hip joint roll axis 117, a knee joint pitch axis 118, an ankle joint pitch axis 119, An ankle joint roll shaft 120 and a foot 121 are provided. In the present specification, the intersection of the hip joint pitch axis 116 and the hip joint roll axis 117 defines the hip joint position of the robot device 1. The foot 121 of the human body is actually a structure including a multi-joint * multi-degree-of-freedom sole, but the sole of the robot device 60 has zero degrees of freedom. Therefore, each leg is configured with six degrees of freedom.
[0080] 以上を総括すれば、ロボット装置 60全体としては、合計で 3 + 7 X 2 + 3 + 6 X 2 = 32 自由度を有することになる。ただし、エンタテインメント向けのロボット装置 1が必ずしも 32自由度に限定されるわけではない。設計'制作上の制約条件や要求仕様等に応 じて、自由度すなわち関節数を適宜増減することができることはいうまでもない。  In summary, the robot device 60 as a whole has a total of 3 + 7 × 2 + 3 + 6 × 2 = 32 degrees of freedom. However, the robot 1 for entertainment is not necessarily limited to 32 degrees of freedom. It goes without saying that the degree of freedom, that is, the number of joints, can be appropriately increased or decreased according to the constraints on design and production and the required specifications.
[0081] 上述したようなロボット装置 60が持つ各自由度は、実際にはァクチユエータを用いて 実装される。外観上で余分な膨らみを排してヒトの自然体形状に近似させること、 2足 歩行という不安定構造体に対して姿勢制御を行うことなどの要請から、ァクチユエ一 タは小型且つ軽量であることが好ましい。また、ァクチユエータは、ギア直結型で且つ サーボ制御系をワンチップ化してモータユニット内に搭載したタイプの小型 ACサー ボ-ァクチユエータで構成することがより好ましい。  [0081] Each degree of freedom of the robot device 60 as described above is actually implemented using an actuator. Actuator units must be small and lightweight due to requirements such as eliminating extra bulges in appearance and approximating the human body shape, and controlling the posture of unstable structures such as bipedal walking. Is preferred. Further, it is more preferable that the actuator is constituted by a small AC servo actuator of a type directly connected to a gear and a type in which a servo control system is integrated into a motor unit and mounted in a motor unit.
[0082] 図 12には、ロボット装置 60の制御システム構成を模式的に示している。図 12に示す ように、制御システムは、ユーザ入力などに動的に反応して情緒判断や感情表現を 司る思考制御モジュール 200と、ァクチユエータ 350の駆動などロボット装置 1の全身 協調運動を制御する運動制御モジュール 300とで構成される。  FIG. 12 schematically shows a control system configuration of the robot device 60. As shown in FIG. 12, the control system includes a thought control module 200 that dynamically responds to a user input or the like to determine emotions and expresses emotions, and a movement that controls the whole body cooperative movement of the robot apparatus 1 such as driving an actuator 350. The control module 300 is included.
[0083] 思考制御モジュール 200は、情緒判断や感情表現に関する演算処理を実行する CP U (Central Processing Unit) 211や、 RAM (Random Access Memory) 212、 ROM ( Read Only Memory) 213、及び、外部記憶装置(ハード'ディスク'ドライブなど) 214 で構成される、モジュール内で自己完結した処理を行うことができる、独立駆動型の 情報処理装置である。 [0083] Thought control module 200 executes arithmetic processing relating to emotion determination and emotion expression. U (Central Processing Unit) 211, RAM (Random Access Memory) 212, ROM (Read Only Memory) 213, and external storage device (hard disk drive, etc.) 214 This is an independently driven information processing device that can perform processing.
[0084] この思考制御モジュール 200は、画像入力装置 251から入力される画像データや音 声入力装置 252から入力される音声データなど、外界からの刺激などに従って、ロボ ット装置 60の現在の感情や意思を決定する。ここで、画像入力装置 251は、例えば CCD (Charge Coupled Device)カメラを複数備えており、また、音声入力装置 252は 、例えばマイクロホンを複数備えている。  [0084] The thinking control module 200 receives the current emotion of the robot device 60 in accordance with an external stimulus such as image data input from the image input device 251 or voice data input from the voice input device 252. And make decisions. Here, the image input device 251 includes, for example, a plurality of charge coupled device (CCD) cameras, and the audio input device 252 includes, for example, a plurality of microphones.
[0085] また、思考制御モジュール 200は、意思決定に基づいた動作又は行動シーケンス、 すなわち四肢の運動を実行するように、運動制御モジュール 300に対して指令を発 行する。  [0085] Further, thinking control module 200 issues a command to movement control module 300 to execute a motion or action sequence based on a decision, that is, a movement of a limb.
[0086] 一方の運動制御モジュール 300は、ロボット装置 60の全身協調運動を制御する CP U311や、 RAM312、 ROM313、及び外部記憶装置(ハード 'ディスク'ドライブなど ) 314で構成される、モジュール内で自己完結した処理を行うことができる、独立駆動 型の情報処理装置である。外部記憶装置 314には、例えば、オフラインで算出され た歩行パターンや目標とする ZMP軌道、その他の行動計画を蓄積することができる 。ここで、 ZMPとは、歩行中の床反力によるモーメントがゼロとなる床面上の点のこと であり、また、 ZMP軌道とは、例えばロボット装置 1の歩行動作期間中に ZMPが動く 軌跡を意味する。なお、 ZMPの概念並びに ZMPを歩行ロボットの安定度判別規範 に適用する点については、 Miomir Vukobratovic著" LEGGED LOCOMOTION ROBOTS" (加藤一郎外著『歩行ロボットと人工の足』(日刊工業新聞  [0086] One motion control module 300 includes a CPU 311 for controlling the whole body cooperative motion of the robot device 60, a RAM 312, a ROM 313, and an external storage device (such as a hard disk drive) 314. It is an independently driven information processing device that can perform self-contained processing. In the external storage device 314, for example, a walking pattern calculated offline, a target ZMP trajectory, and other action plans can be stored. Here, the ZMP is a point on the floor at which the moment due to the floor reaction force during walking becomes zero, and the ZMP trajectory is, for example, a trajectory where the ZMP moves during the walking operation of the robot device 1. Means The concept of ZMP and the application of ZMP to the stability discrimination standard of walking robots are described in "Legged Locomotion Robots" by Miomir Vukobratovic (Ichiro Kato, "Walking Robots and Artificial Feet" (Nikkan Kogyo Shimbun)
社))に記載されている。  Company)).
[0087] 運動制御モジュール 300には、図 11に示したロボット装置 60の全身に分散するそれ ぞれの関節自由度を実現するァクチユエータ 350、体幹部ユニット 2の姿勢や傾斜を 計測する姿勢センサ 351、左右の足底の離床又は着床を検出する接地確認センサ 3 52, 353、バッテリなどの電源を管理する電源制御装置 354などの各種の装置が、 バス 'インターフェース(I/F) 301経由で接続されている。ここで、姿勢センサ 351は 、例えば加速度センサとジャイロ 'センサの組み合わせによって構成され、接地確認 センサ 352, 353は、近接センサ又はマイクロ 'スィッチなどで構成される。 [0087] The motion control module 300 includes an actuator 350 for realizing the degree of freedom of each joint distributed over the whole body of the robot device 60 shown in Fig. 11, and a posture sensor 351 for measuring the posture and inclination of the trunk unit 2. Various devices such as a grounding confirmation sensor 3 52, 353 that detects leaving or landing on the left and right soles and a power supply control device 354 that manages the power supply such as a battery are connected via the bus interface (I / F) 301. It is connected. Here, the attitude sensor 351 is For example, it is constituted by a combination of an acceleration sensor and a gyro 'sensor, and the ground confirmation sensors 352 and 353 are constituted by a proximity sensor or a micro' switch.
[0088] 思考制御モジュール 200と運動制御モジュール 300は、共通のプラットフォーム上で 構築され、両者間はバス 'インターフェース 201 , 301を介して相互接続されている。 [0088] The thinking control module 200 and the motion control module 300 are constructed on a common platform, and are interconnected via bus interfaces 201 and 301.
[0089] 運動制御モジュール 300では、思考制御モジュール 200から指示された行動を体現 すべぐ各ァクチユエータ 350による全身協調運動を制御する。すなわち、 CPU311 は、思考制御モジュール 200から指示された行動に応じた動作パターンを外部記憶 装置 314から取り出し、又は、内部的に動作パターンを生成する。そして、 CPU311 は、指定された動作パターンに従って、足部運動、 ZMP軌道、体幹運動、上肢運動 、腰部水平位置及び高さなどを設定するとともに、これらの設定内容に従った動作を 指示する指令値を各ァクチユエータ 350に転送する。 The exercise control module 300 controls the whole-body cooperative exercise by each actuator 350 that embodies the action specified by the thought control module 200. That is, the CPU 311 extracts an operation pattern corresponding to the action instructed from the thought control module 200 from the external storage device 314, or internally generates an operation pattern. Then, the CPU 311 sets a foot motion, a ZMP trajectory, a trunk motion, an upper limb motion, a waist horizontal position and a height, etc. in accordance with the specified motion pattern, and issues a command for instructing the motion in accordance with these settings. Transfer the value to each actuator 350.
[0090] また、 CPU311は、姿勢センサ 351の出力信号によりロボット装置 60の体幹部ュニッ ト 62の姿勢や傾きを検出するとともに、各接地確認センサ 352, 353の出力信号によ り各脚部ユニット 65R/Lが遊脚又は立脚のいずれの状態であるかを検出することに よって、ロボット装置 60の全身協調運動を適応的に制御することができる。  The CPU 311 detects the posture and inclination of the trunk unit 62 of the robot device 60 based on the output signal of the posture sensor 351, and also detects each leg unit based on the output signals of the grounding confirmation sensors 352 and 353. By detecting whether the 65R / L is in the free leg state or in the standing state, the whole body cooperative movement of the robot device 60 can be adaptively controlled.
[0091] また、 CPU311は、 ZMP位置が常に ZMP安定領域の中心に向力うように、ロボット 装置 60の姿勢や動作を制御する。  [0091] The CPU 311 controls the posture and operation of the robot device 60 so that the ZMP position always faces the center of the ZMP stable region.
[0092] さらに、運動制御モジュール 300は、思考制御モジュール 200において決定された 意思通りの行動がどの程度発現された力、すなわち処理の状況を、思考制御モジュ 一ノレ 200に返すようになってレ、る。  [0092] Further, the motion control module 300 returns to the thought control module 200 the force to which the action according to the intention determined in the thought control module 200 is expressed, that is, the processing state. RU
[0093] このようにしてロボット装置 60は、制御プログラムに基づいて自己及び周囲の状況を 判断し、 自律的に行動することができる。  [0093] In this manner, the robot device 60 can determine its own and surrounding conditions based on the control program, and can act autonomously.
[0094] このロボット装置 60において、上述した歌声合成機能をインプリメントしたプログラム( データを含む)は例えば思考制御モジュール 200の ROM213に置かれる。この場合 、歌声合成プログラムの実行は思考制御モジュール 200の CPU211により行われる  In the robot device 60, a program (including data) implementing the above-described singing voice synthesizing function is stored in, for example, the ROM 213 of the thinking control module 200. In this case, the execution of the singing voice synthesis program is performed by the CPU 211 of the thinking control module 200.
[0095] このようなロボット装置に上記歌声合成機能を組み込むことにより、伴奏に合わせて 歌うロボットとしての表現能力が新たに獲得され、エンタテインメント性が広がり、人間 との親密性が深められる。 [0095] By incorporating the singing voice synthesis function into such a robot apparatus, The expression ability as a singing robot is newly acquired, entertainment is expanded, and intimacy with humans is deepened.
[0096] なお、本発明は上述した実施の形態のみに限定されるものではなぐ本発明の要旨 を逸脱しない範囲において種々の変更が可能であることは勿論である。  [0096] Note that the present invention is not limited to only the above-described embodiment, and it is needless to say that various changes can be made without departing from the spirit of the present invention.
[0097] 例えば、本件出願人が先に提案した特願 2002— 73385の明細書及び図面に記載 の音声合成方法及び装置等に用いられる歌声合成部及び波形生成部に対応した 歌声生成部 7に使用可能な歌声情報を例示しているが、この他種々の歌声生成部を 用いることができ、この場合、各種の歌声生成部によって歌声生成に必要とされる情 報を含むような歌声情報を、上記演奏データから生成するようにすればよいことは勿 論である。また、演奏データは、 MIDIデータに限定されず、種々の規格の演奏デー タを使用可能である。  [0097] For example, the singing voice generating unit 7 corresponding to the singing voice synthesizing unit and the waveform generating unit used in the voice synthesizing method and apparatus described in the specification and drawings of Japanese Patent Application No. 2002-73385 previously proposed by the present applicant. Although singing voice information that can be used is illustrated, various other singing voice generating units can be used. In this case, singing voice information including information required for singing voice generation by various singing voice generating units is used. Needless to say, the performance data may be generated from the performance data. The performance data is not limited to MIDI data, and performance data of various standards can be used.
[0098] なお、本発明は、図面を参照して説明した上述の実施例に限定されるものではなぐ 添付の請求の範囲及びその主旨を逸脱することなぐ様々な変更、置換又はその同 等のものを行うことができることは当業者にとって明らかである。  [0098] The present invention is not limited to the above-described embodiment described with reference to the drawings. The appended claims and various modifications, substitutions, or the like without departing from the spirit of the invention are set forth. It will be clear to those skilled in the art that things can be done.

Claims

請求の範囲 The scope of the claims
[1] 1.演奏データを音の高さ、長さ、歌詞の音楽情報として解析する解析工程と、 解析 された音楽情報の音符列の音符に対して、歌唱又は演奏のスタイルに応じて、音量 変化、音程変化、タイミング変化の少なくとも 1つを含む表現変化を付与することによ り歌唱又は演奏パターンを変更するパターン変更工程と、 パターン変更された音楽 情報の音符列に基づいて歌声又は楽音を生成する生成工程と を有することを特徴 とする信号合成方法。  [1] 1. Analyzing the performance data as pitch, length and lyrics music information, and analyzing the notes of the analyzed music information in the note sequence according to the style of singing or performance. A pattern changing step of changing a singing or performance pattern by giving an expression change including at least one of a volume change, a pitch change, and a timing change; and a singing voice or a musical tone based on the musical note sequence of the pattern-changed music information. And a generating step of generating the signal.
[2] 2.上記演奏データは MIDIファイルの演奏データであることを特徴とする請求の範 囲第 1項記載の信号合成方法。  [2] 2. The signal synthesizing method according to claim 1, wherein the performance data is performance data of a MIDI file.
[3] 3.上記表現変化を付与するためのパラメータは、上記歌唱又は演奏のスタイルと、 上記音符の長さ、強さ、強さの増減状態、高さ及び楽曲の速度の少なくとも 1つとに 応じて設定されること特徴とする請求の範囲第 1項記載の信号合成方法。 [3] 3. The parameters for giving the above-mentioned expression change are the singing or playing style and at least one of the note length, strength, strength increase / decrease state, pitch, and music speed. 2. The signal synthesizing method according to claim 1, wherein the signal synthesizing method is set accordingly.
[4] 4.演奏データを音の高さ、長さ、歌詞の音楽情報として解析する解析手段と、 歌唱 又は演奏のスタイルに応じて、音楽情報の音符に対して、音量変化、音程変化、タイ ミング変化の少なくとも 1つを含む表現変化を付与するためのパラメータが設定され たパターンデータが蓄積された記憶手段と、 上記解析手段により解析された音楽情 報の音符列の音符に対応して、上記記憶手段により読み出された音量変化、音程変 ィ匕、タイミング変化の少なくとも 1つを含む表現変化を付与することにより歌唱又は演 奏パターンを変更するパターン変更手段と、 パターン変更された音楽情報の音符 列に基づいて歌声又は楽音を生成する生成手段と を有することを特徴とする信号 [4] 4. Analysis means for analyzing performance data as musical information of pitch, length, lyrics, and, according to the style of singing or performance, change of volume, pitch, Storage means for storing pattern data in which a parameter for giving an expression change including at least one of the timing changes is set, and a note corresponding to a note of a note string of music information analyzed by the analysis means. Pattern changing means for changing a singing or performance pattern by giving an expression change including at least one of a volume change, a pitch change, and a timing change read by the storage means; Generating means for generating a singing voice or a musical tone based on a note sequence of information.
[5] 5.上記演奏データは MIDIファイルの演奏データであることを特徴とする請求の範 囲第 4項記載の信号合成装置。 [5] 5. The signal synthesizer according to claim 4, wherein the performance data is performance data of a MIDI file.
[6] 6.上記表現変化を付与するためのパラメータは、上記歌唱又は演奏のスタイルと、 上記音符の長さ、強さ、強さの増減状態、高さ及び楽曲の速度の少なくとも 1つとに 応じて設定されること特徴とする請求の範囲第 4項記載の信号合成装置。  [6] 6. The parameters for giving the expression change are the singing or playing style and at least one of the note length, strength, strength increase / decrease state, pitch, and music speed. 5. The signal synthesizing device according to claim 4, wherein the signal synthesizing device is set in accordance with the setting.
[7] 7.演奏データを音の高さ、長さ、歌詞の音楽情報として解析する解析工程と、 解析 された音楽情報の歌詞情報に基づき音符列に対して歌詞を付与して歌声情報とす る歌詞付与工程と、 上記解析された音楽情報の音符列の音符に対して、歌唱又は 演奏のスタイルに応じて、音量変化、音程変化、タイミング変化の少なくとも 1つを含 む表現変化を付与することにより上記歌声情報の歌唱パターンを変更するパターン 変更工程と、 パターン変更された歌声情報に基づき歌声を生成する歌声生成工程 と を有することを特徴とする歌声合成方法。 [7] 7. An analysis step of analyzing the performance data as musical information of the pitch, length and lyrics, and adding lyric to the note sequence based on the analyzed lyric information of the musical information and singing voice information You Lyric providing step, and giving an expression change including at least one of a volume change, a pitch change, and a timing change to the notes in the note sequence of the analyzed music information according to a singing or performance style. A singing voice synthesizing method, comprising: a pattern changing step of changing a singing pattern of the singing voice information, and a singing voice generating step of generating a singing voice based on the changed singing voice information.
[8] 8.上記演奏データは MIDIファイルの演奏データであることを特徴とする請求の範 囲第 7項記載の歌声合成方法。  [8] 8. The singing voice synthesizing method according to claim 7, wherein the performance data is performance data of a MIDI file.
[9] 9.上記表現変化を付与するためのパラメータは、上記歌唱スタイルと、上記音符の 長さ、強さ、高さ及び楽曲の速度の少なくとも 1つとに応じて設定されることを特徴と する請求の範囲第 7項記載の歌声合成方法。 [9] 9. The parameter for giving the expression change is set according to the singing style and at least one of the note length, strength, height, and music speed. 8. The singing voice synthesizing method according to claim 7, wherein
[10] 10.上記表現変化は、対象となる音符の音に対してビブラート、ピッチベンド、ェクス プレツシヨンの少なくとも 1つを付与することを特徴とする請求の範囲第 7項記載の歌 声合成方法。 10. The singing voice synthesizing method according to claim 7, wherein the expression change is performed by adding at least one of vibrato, pitch bend, and expression to a target note sound.
[11] 11.上記ビブラートを付与するためのパラメータは、振幅開始の遅れの情報と、振幅 の情報と、周期の情報と、振幅の増減の情報と、周期の増減の情報との少なくとも 1 つを含み、上記エクスプレッションを付与するためのパラメータは、音符の長さに対す る比の時間情報とその時間軸上での特徴的な任意の点における強さの情報の少なく とも 1つを含むことを特徴とする請求の範囲第 10項記載の歌声合成方法。  [11] 11. The parameter for adding the vibrato is at least one of the following: amplitude delay information, amplitude information, cycle information, amplitude increase / decrease information, and cycle increase / decrease information. And the parameters for giving the expression include at least one of the time information of the ratio to the length of the note and the strength information at any characteristic point on the time axis. 11. The singing voice synthesizing method according to claim 10, wherein:
[12] 12.上記歌唱スタイルは、ユーザ設定、演奏データのトラック名、楽曲名、マーカの いずれかにより選択されることを特徴とする請求の範囲第 7項記載の歌声合成方法。  12. The singing voice synthesizing method according to claim 7, wherein the singing style is selected by one of a user setting, a track name of performance data, a music title, and a marker.
[13] 13.歌唱スタイルに応じて、音楽情報の音符に対して、音量変化、音程変化、タイミ ング変化の少なくとも 1つを含む表現変化を付与するためのパラメータが設定された パターンデータが蓄積された記憶手段と、 演奏データを音の高さ、長さ、歌詞の音 楽情報として解析する解析手段と、 解析された音楽情報の歌詞情報に基づき音符 列に対して歌詞を付与して歌声情報とする歌詞付与手段と、 上記解析手段により 解析された音楽情報の音符列の音符に対応して、上記記憶手段により読み出された 音量変化、音程変化、タイミング変化の少なくとも 1つを含む表現変化を付与すること により上記歌声情報の歌唱パターンを変更するパターン変更手段と、 パターン変更 された音楽情報の音符列に基づいて歌声を生成する歌声生成手段と を有すること を特徴とする歌声合成装置。 [13] 13. Accumulates pattern data that sets parameters for giving expression changes including at least one of volume change, pitch change, and timing change to musical information notes in accordance with the singing style. Means for analyzing the performance data as pitch, length and lyrics music information, and singing voice by adding lyrics to the note sequence based on the analyzed music information lyrics information. An expression including at least one of a volume change, a pitch change, and a timing change read out by the storage means, corresponding to a note in the note sequence of the music information analyzed by the analysis means; Pattern changing means for changing the singing pattern of the singing voice information by giving a change; And a singing voice generating means for generating a singing voice based on the note sequence of the music information obtained.
[14] 14.上記演奏データは MIDIファイルの演奏データであることを特徴とする請求の範 囲第 13項記載の歌声合成装置。  14. The singing voice synthesizer according to claim 13, wherein the performance data is performance data of a MIDI file.
[15] 15.上記表現変化を付与するためのパラメータは、上記歌唱スタイルと、上記音符の 長さ、強さ、強さの増減状態、高さ及び楽曲の速度の少なくとも 1つとに応じて設定さ れることを特徴とする請求の範囲第 13項記載の歌声合成装置。  [15] 15. The parameters for giving the expression change are set according to the singing style and at least one of the note length, strength, strength increase / decrease state, height, and music speed. 14. The singing voice synthesizing device according to claim 13, wherein the singing voice synthesizing device is used.
[16] 16.上記表現変化は、対象となる音符の音に対してビブラート、ピッチベンド、ェクス プレツシヨンの少なくとも 1つを付与することを特徴とする請求の範囲第 13項記載の 歌声合成装置。  16. The singing voice synthesizing device according to claim 13, wherein the expression change is performed by adding at least one of vibrato, pitch bend, and expression to a target note sound.
[17] 17.上記ビブラートを付与するためのパラメータは、振幅開始の遅れの情報と、振幅 の情報と、周期の情報と、振幅の増減の情報と、周期の増減の情報との少なくとも 1 つを含み、上記エクスプレッションを付与するためのパラメータは、音符の長さに対す る比の時間情報とその時間軸上での特徴的な任意の点における強さの情報の少なく とも 1つを含むことを特徴とする請求の範囲第 16項記載の歌声合成装置。  [17] 17. The parameter for adding the vibrato is at least one of information on delay of amplitude start, information on amplitude, information on cycle, information on increase / decrease in amplitude, and information on increase / decrease in cycle. And the parameters for giving the expression include at least one of the time information of the ratio to the length of the note and the strength information at any characteristic point on the time axis. 17. The singing voice synthesizing device according to claim 16, wherein:
[18] 18.上記歌唱スタイルは、ユーザ設定、演奏データのトラック名、楽曲名、マーカの いずれかにより選択されることを特徴とする請求の範囲第 13項記載の歌声合成装置  18. The singing voice synthesizing apparatus according to claim 13, wherein the singing style is selected by any one of a user setting, a track name of performance data, a music name, and a marker.
[19] 19.所定の処理をコンピュータに実行させるためのプログラムであって、 入力された 演奏データを音の高さ、長さ、歌詞の音楽情報として解析する解析工程と、 解析さ れた音楽情報の歌詞情報に基づき音符列に対して歌詞を付与して歌声情報とする 歌詞付与工程と、 上記解析された音楽情報の音符列の音符に対して、歌唱又は演 奏のスタイルに応じて、音量変化、音程変化、タイミング変化の少なくとも 1つを含む 表現変化を付与することにより上記歌声情報の歌唱パターンを変更するパターン変 更工程と、 パターン変更された歌声情報に基づき歌声を生成する歌声生成工程と を有することを特 ί敷とするプログラム。 [19] 19. A program for causing a computer to execute a predetermined process, comprising: an analysis step of analyzing input performance data as musical information of pitch, length, and lyrics; A lyric imparting step of assigning lyrics to the note sequence based on the lyric information of the information to produce singing voice information; and a singing or performing style for the notes of the analyzed musical information note sequence A pattern changing step of changing the singing pattern of the singing voice information by giving an expression change including at least one of a volume change, a pitch change, and a timing change; and a singing voice generating generating a singing voice based on the changed singing voice information. A program that specializes in having a process and.
[20] 20.所定の処理をコンピュータに実行させるためのプログラムが記録されたコンビュ 一タ読取可能な記録媒体であって、 入力された演奏データを音の高さ、長さ、歌詞 の音楽情報として解析する解析工程と、 解析された音楽情報の歌詞情報に基づき 音符列に対して歌詞を付与して歌声情報とする歌詞付与工程と、 上記解析された 音楽情報の音符列の音符に対して、歌唱又は演奏のスタイルに応じて、音量変化、 音程変化、タイミング変化の少なくとも 1つを含む表現変化を付与することにより上記 歌声情報の歌唱パターンを変更するパターン変更工程と、 パターン変更された歌 声情報に基づき歌声を生成する歌声生成工程と を有することを特徴とするプロダラ ムが記録された記録媒体。 [20] 20. A computer-readable recording medium on which a program for causing a computer to execute a predetermined process is recorded. An analysis step of analyzing the music information of the music information; a lyrics assignment step of assigning lyrics to the note string based on the analyzed lyrics information of the music information to obtain singing voice information; and a note of the note string of the analyzed music information. A pattern change step of changing the singing pattern of the singing voice information by giving an expression change including at least one of a volume change, a pitch change, and a timing change in accordance with a singing or performance style. And a singing voice generating step of generating a singing voice based on the obtained singing voice information.
[21] 21.供給された入力情報に基づいて動作を行う自律型のロボット装置であって、歌 唱スタイルに応じて、音楽情報の音符に対して、音量変化、音程変化、タイミング変 化の少なくとも 1つを含む表現変化を付与するためのパラメータが設定されたパター ンデータが蓄積された記憶手段と、 演奏データを音の高さ、長さ、歌詞の音楽情報 として解析する解析手段と、 解析された音楽情報の歌詞情報に基づき音符列に対 して歌詞を付与して歌声情報とする歌詞付与手段と、 上記解析手段により解析され た音楽情報の音符列の音符に対応して、上記記憶手段により読み出された音量変 ィ匕、音程変化、タイミング変化の少なくとも 1つを含む表現変化を付与することにより 上記歌声情報の歌唱パターンを変更するパターン変更手段と、 パターン変更され た音楽情報の音符列に基づいて歌声を生成する歌声生成手段と を有することを特 徴とするロボット装置。  [21] 21. An autonomous robot device that operates based on the supplied input information, and performs volume change, pitch change, and timing change for musical information notes according to the singing style. Storage means for storing pattern data in which parameters for giving expression changes including at least one are set, analysis means for analyzing performance data as musical information of pitch, length, lyrics, and analysis Lyrics adding means for assigning lyrics to a note sequence based on the lyric information of the analyzed music information to produce singing voice information, and the storage corresponding to the notes of the note sequence of the music information analyzed by the analyzing means. Pattern changing means for changing the singing pattern of the singing voice information by giving an expression change including at least one of a volume change, a pitch change, and a timing change read by the means; Robot device according to feature that it has a voice generating means for generating a singing voice based on the sequence of notes of emissions changed music information.
PCT/JP2004/008333 2003-06-13 2004-06-14 Signal combination method and device, singing voice synthesizing method and device, program and recording medium, and robot device WO2004111993A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2003170000A JP2005004106A (en) 2003-06-13 2003-06-13 Signal synthesis method and device, singing voice synthesis method and device, program, recording medium, and robot apparatus
JP2003-170000 2003-06-13

Publications (1)

Publication Number Publication Date
WO2004111993A1 true WO2004111993A1 (en) 2004-12-23

Family

ID=33549397

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2004/008333 WO2004111993A1 (en) 2003-06-13 2004-06-14 Signal combination method and device, singing voice synthesizing method and device, program and recording medium, and robot device

Country Status (2)

Country Link
JP (1) JP2005004106A (en)
WO (1) WO2004111993A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110634460A (en) * 2018-06-21 2019-12-31 卡西欧计算机株式会社 Electronic musical instrument, control method for electronic musical instrument, and storage medium
CN111402842A (en) * 2020-03-20 2020-07-10 北京字节跳动网络技术有限公司 Method, apparatus, device and medium for generating audio

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080167739A1 (en) * 2007-01-05 2008-07-10 National Taiwan University Of Science And Technology Autonomous robot for music playing and related method
JP5218154B2 (en) * 2009-03-02 2013-06-26 ヤマハ株式会社 Music signal generator
JP6587007B1 (en) * 2018-04-16 2019-10-09 カシオ計算機株式会社 Electronic musical instrument, electronic musical instrument control method, and program
JP6587008B1 (en) * 2018-04-16 2019-10-09 カシオ計算機株式会社 Electronic musical instrument, electronic musical instrument control method, and program

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08152878A (en) * 1994-11-30 1996-06-11 Yamaha Corp Automatic playing device
JPH10274982A (en) * 1997-03-31 1998-10-13 Kawai Musical Instr Mfg Co Ltd Electronic instrument
JP2001042868A (en) * 1999-05-26 2001-02-16 Yamaha Corp Performance data generation and generating device, and recording medium therefor
JP2001159892A (en) * 1999-08-09 2001-06-12 Yamaha Corp Performance data preparing device and recording medium
JP2001282269A (en) * 2000-03-31 2001-10-12 Clarion Co Ltd Information providing system and utterance doll
JP2002073064A (en) * 2000-08-28 2002-03-12 Yamaha Corp Voice processor, voice processing method and information recording medium
JP2003099053A (en) * 2001-09-25 2003-04-04 Yamaha Corp Playing data processor and program
JP2003177751A (en) * 2001-10-05 2003-06-27 Yamaha Corp Playing data processing device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08152878A (en) * 1994-11-30 1996-06-11 Yamaha Corp Automatic playing device
JPH10274982A (en) * 1997-03-31 1998-10-13 Kawai Musical Instr Mfg Co Ltd Electronic instrument
JP2001042868A (en) * 1999-05-26 2001-02-16 Yamaha Corp Performance data generation and generating device, and recording medium therefor
JP2001159892A (en) * 1999-08-09 2001-06-12 Yamaha Corp Performance data preparing device and recording medium
JP2001282269A (en) * 2000-03-31 2001-10-12 Clarion Co Ltd Information providing system and utterance doll
JP2002073064A (en) * 2000-08-28 2002-03-12 Yamaha Corp Voice processor, voice processing method and information recording medium
JP2003099053A (en) * 2001-09-25 2003-04-04 Yamaha Corp Playing data processor and program
JP2003177751A (en) * 2001-10-05 2003-06-27 Yamaha Corp Playing data processing device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110634460A (en) * 2018-06-21 2019-12-31 卡西欧计算机株式会社 Electronic musical instrument, control method for electronic musical instrument, and storage medium
CN111402842A (en) * 2020-03-20 2020-07-10 北京字节跳动网络技术有限公司 Method, apparatus, device and medium for generating audio

Also Published As

Publication number Publication date
JP2005004106A (en) 2005-01-06

Similar Documents

Publication Publication Date Title
JP4483188B2 (en) SINGING VOICE SYNTHESIS METHOD, SINGING VOICE SYNTHESIS DEVICE, PROGRAM, RECORDING MEDIUM, AND ROBOT DEVICE
JP3864918B2 (en) Singing voice synthesis method and apparatus
EP1605435B1 (en) Singing voice synthesizing method, singing voice synthesizing device, program, recording medium, and robot
JP4150198B2 (en) Speech synthesis method, speech synthesis apparatus, program and recording medium, and robot apparatus
JP3858842B2 (en) Singing voice synthesis method and apparatus
JP2003271174A (en) Speech synthesis method, speech synthesis device, program, recording medium, method and apparatus for generating constraint information and robot apparatus
US20020198717A1 (en) Method and apparatus for voice synthesis and robot apparatus
WO2002076686A1 (en) Action teaching apparatus and action teaching method for robot system, and storage medium
WO2002091356A1 (en) Obot device, character recognizing apparatus and character reading method, and control program and recording medium
JP4415573B2 (en) SINGING VOICE SYNTHESIS METHOD, SINGING VOICE SYNTHESIS DEVICE, PROGRAM, RECORDING MEDIUM, AND ROBOT DEVICE
WO2004111993A1 (en) Signal combination method and device, singing voice synthesizing method and device, program and recording medium, and robot device
WO2002086861A1 (en) Language processor
JP2003271172A (en) Method and apparatus for voice synthesis, program, recording medium and robot apparatus
Cosentino et al. Human–robot musical interaction
JP2002258886A (en) Device and method for combining voices, program and recording medium
JP2002346958A (en) Control system and control method for legged mobile robot
JP2002304187A (en) Device and method for synthesizing voice, program and recording medium
Özen et al. Cooperative dancing with an industrial manipulator: Computational cybernetics complexities
JP2001043126A (en) Robot system

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase