WO2004111993A1

WO2004111993A1 - Signal combination method and device, singing voice synthesizing method and device, program and recording medium, and robot device

Info

Publication number: WO2004111993A1
Application number: PCT/JP2004/008333
Authority: WO
Inventors: Kenichiro Kobayashi
Original assignee: Sony Corporation
Priority date: 2003-06-13
Filing date: 2004-06-14
Publication date: 2004-12-23
Also published as: JP2005004106A

Abstract

Inputted MIDI file performance data is analyzed as music information including pitches, lengths, and words (S2, S3). When the singing style is changed, the singing voice data is altered so that an expression change is given to the musical notes that match the condition. (S7, S8, S9). A singing voice is generated based on the singing voice information whose singing voice pattern is altered (S11). This makes it possible to synthesize a singing voice using performance data such as MIDI data and to alter the singing pattern according to the singing style.

Description

Specification

Signal synthesizing method and apparatus, singing voice synthesizing method and apparatus, program and recording medium, and robot apparatus

Technical field

The present invention relates to a signal synthesizing method and apparatus for synthesizing a signal such as a singing voice or a musical tone from performance data.

The present invention relates to a singing voice synthesizing method and apparatus, a program and a recording medium, and a robot apparatus.

[0002] This application claims priority based on Japanese Patent Application No. 2003-170000, filed in Japan on June 13, 2003, and this application is incorporated herein by reference. Incorporated.

Background art

[0003] A technique for generating a singing voice from given singing data by a computer or the like is already known as represented by Japanese Patent No. 323 3036.

[0004] MIDI (musical instrument digital interface) data is representative performance data and is a practical industry standard. Typically, MIDI data is used to generate a musical tone by controlling a digital sound source called a MIDI sound source (a sound source operated by MIDI data, such as a computer sound source or an electronic musical instrument sound source). A MIDI file (eg, SMF (standard MIDI file)) can contain lyric data and is used to automatically create a score with lyrics.

[0005] Further, an attempt to use MIDI data as a parameter expression (special data expression) of a singing voice or a phoneme segment constituting the singing voice has been proposed as represented by Japanese Patent Application Laid-Open No. 11-95798.

[0006] However, in these conventional technologies, the power to express a singing voice in the data format of MIDI data is a control just as if controlling a musical instrument. The power that does not utilize the power.

[0007] Furthermore, MIDI data created for other musical instruments cannot be converted into a singing voice without correction. [0008] In addition, voice synthesis software that reads e-mails and websites is available from Sony Corporation.

The power spoken by many manufacturers, including Simple Speech, was read in the same tone as ordinary text.

By the way, a mechanical device that performs a motion similar to the motion of a human (living organism) using an electric or magnetic action is called a “robot”. Robots began to spread in Japan in the late 1960s, but most of them were industrial robots such as manipulators and transfer robots for the purpose of automation of production work in factories and unmanned operations. Met.

[0010] Recently, the development of practical robots that support life as a human partner, that is, support human activities in various situations in the living environment and other everyday life has been promoted. Unlike industrial robots, such practical robots have the ability to learn by themselves different personalities or how to adapt to various environments in various aspects of the human living environment. For example, it is designed based on the body mechanism and movement of a four-legged animal such as a dog or cat, or a “pet-type” robot that simulates its movement, or the body mechanism or movement of a human who walks upright on two legs. Robotic devices such as "humanoid" or "humanoid" robots are already being put into practical use.

[0011] Since these robot devices can perform various operations that emphasize entertainment properties as compared with industrial robots, they are sometimes referred to as entertainment robots. Some of such robot devices operate autonomously in response to external information or internal conditions.

[0012] The robot used in this autonomously operating robot

Artificial intelligence (AI) artificially realizes intellectual functions such as inference and judgment, and also attempts to artificially realize functions such as emotions and instinct. I have. Among such visual expression means as a means for expressing artificial intelligence to the outside, natural language expression means, and the like, the use of speech is an example of a natural language expression function.

[0013] As described above, the conventional singing voice synthesis uses data of a special format, and even if MIDI data is used, the lyrics data embedded in the data cannot be effectively used. Singing MIDI data created for other instruments could not.

[0014] Furthermore, the singing style and the like are not particularly taken into consideration, and the present situation is that the expressive power must be poor.

Disclosure of the invention

Problems the invention is trying to solve

The present invention has been proposed in view of such a conventional situation. For example, it is possible to synthesize a singing voice using performance data such as MIDI data. It is an object of the present invention to provide a method and an apparatus for synthesizing a singing voice or a musical sound, and a singing voice synthesizing method and an apparatus, which enable expression in consideration of a style of a musical sound.

[0016] It is still another object of the present invention to provide a program and a recording medium that allow a computer to perform such a singing voice synthesis function.

[0017] It is a further object of the present invention to provide a robot apparatus that realizes such a singing voice synthesis function.

Means for solving the problem

[0018] In order to achieve the above object, the method and apparatus for synthesizing a singing voice or a musical tone according to the present invention analyze performance data as musical information of pitch, length and lyrics, and analyze the analyzed musical information. The singing or performance pattern is changed by giving an expression change including at least one of a volume change, a pitch change, and a timing change to the notes in the note sequence according to the singing or performance style. And generating a singing voice or a musical tone based on the musical note sequence of the music information.

[0019] Further, the singing voice synthesizing method and apparatus according to the present invention achieves the above object by reducing the volume change, the pitch change, and the timing change with respect to the music information note according to the singing style. In advance, pattern data in which parameters for giving expression changes including at least one are set are prepared in advance, and the input performance data is analyzed as the musical information of the pitch, length, and lyrics. Based on the lyric information of the analyzed music information, lyrics are added to the note sequence to make singing voice information. The singing voice information is added to the pattern data prepared in advance in correspondence with the notes of the note sequence of the analyzed music information. Changing the singing pattern of the singing voice information by giving an expression change including at least one of a volume change, a pitch change, and a timing change based on the singing voice information, A singing voice is generated based on a musical note sequence of the music information whose pattern has been changed.

According to this configuration, when generating a singing voice, an expression change including at least one of a volume change, a pitch change, and a timing change is given according to a specified singing style, and the singing style is changed. The ability to change S

Preferably, the performance data is performance data of a MIDI file. Further, the parameter for giving the expression change is set according to the singing style and at least one of the note length, strength, strength increase / decrease state, height, and music speed. It is mentioned. The above-mentioned expression change includes adding at least one of vibrato, pitch bend, and expression to the sound of the target note. The parameter for giving the vibrato includes at least one of information on delay of amplitude start, information on amplitude, information on cycle, information on increase / decrease in amplitude, and information on increase / decrease in cycle. The parameter for assigning the ethasplayion may include at least one of time information of a ratio to a note length and information of strength at a characteristic arbitrary point on the time axis. The singing style may be selected based on the user setting, the track name of the performance data, the song name, the marker, or the deviation.

Further, the program according to the present invention causes a computer to execute the singing voice synthesizing function of the present invention, and the recording medium according to the present invention is a computer-readable medium storing the program. is there.

Further, in order to achieve the above object, the robot apparatus according to the present invention is an autonomous robot apparatus that operates based on supplied input information, and includes music information in accordance with a singing style. Storage means for storing pattern data in which parameters for giving an expression change including at least one of a volume change, a pitch change, and a timing change are stored, and the performance data is stored at a pitch. Analyzing means for analyzing the music information of the music information, length, and lyrics, lyric providing means for providing lyric information to the note sequence based on the lyric information of the analyzed music information, and analyzing by the analyzing means. The singing is performed by giving an expression change including at least one of a volume change, a pitch change, and a timing change read out by the storage means in accordance with the musical note of the musical note sequence of the music information. Singing pattern of voice information And singing voice generating means for generating a singing voice based on the musical note sequence of the music information whose pattern has been changed. As a result, the entertainment properties of the robot can be significantly improved.

[0024] Still other objects of the present invention and specific advantages obtained by the present invention will become more apparent from the description of the embodiments described below with reference to the drawings.

The invention's effect

[0025] According to the present invention as described above, according to the signal synthesizing method and apparatus of the present invention for synthesizing signals such as singing voices and musical tones, the performance data is converted to the pitch, length, and lyrics. It is analyzed as music information, and an expression change including at least one of a volume change, a pitch change, and a timing change is given to the notes in the note sequence of the analyzed music information according to a singing or performance style. Singing or playing pattern by changing the singing or performance pattern and generating a singing voice or musical tone based on the musical note sequence of the pattern-changed music information. Expression change according to the style of music can be given, and the music expression can be greatly improved.

[0026] Further, according to the singing voice synthesizing method and apparatus according to the present invention, an expression change including at least one of a volume change, a pitch change, and a timing change is given to a note of music information according to a singing style. Pattern data in which parameters for setting are provided in advance is prepared, and the input performance data is analyzed as music information of pitch, length, and lyrics, and based on the lyrics information of the analyzed music information. Lyrics are added to the note sequence to produce singing voice information, and the volume change, pitch change, and timing change are performed based on the previously prepared pattern data corresponding to the notes in the note sequence of the analyzed music information. By changing the singing pattern of the singing voice information by giving an expression change including at least one of the following, and generating a singing voice based on the note sequence of the pattern-changed music information, A change in expression according to the singing style can be given to the singing voice at the time of the singing, and the musical expression is significantly improved. Therefore, while the conventional singing style was limited to singing with poor expressive power, arbitrarily selecting a singing style improved the expressive power, and also achieved a singing style adapted to the music. Can realize a more natural singing voice, and can express humor with a mismatched style, further improving entertainment. it can.

Further, the program according to the present invention causes a computer to execute the singing voice synthesizing function of the present invention, and the recording medium according to the present invention is a computer-readable recording medium on which the program is recorded. is there.

[0028] The robot apparatus according to the present invention implements the singing voice synthesizing function of the present invention. That is, according to the robot apparatus of the present invention, a parameter for giving an expression change including at least one of a volume change, a pitch change, and a timing change to a note of music information is set according to a singing style. Prepared pattern data is prepared in advance, and the input performance data is analyzed as musical information of the pitch, length, and lyrics. Based on the analyzed lyrics information of the musical information, the lyrics for the note sequence are analyzed. Is added to the singing voice information, and includes at least one of a volume change, a pitch change, and a timing change based on the pattern data prepared in advance, corresponding to the notes in the note sequence of the analyzed music information. The singing voice at the time of singing is changed by changing the singing pattern of the singing information by giving a change in expression, and generating the singing voice based on the note sequence of the music information whose pattern has been changed. Changes in the expression according to the singing style can be applied to expand the musical expression, achieve a natural singing voice with a singing style that matches the music, and express humor with a mismatched style It is possible to further improve the entertainment. Therefore, the expression ability of the robot device is improved, and the entertainment property can be improved, and the intimacy with humans can be deepened.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 is a block diagram illustrating a system configuration of a singing voice synthesizing apparatus according to the present embodiment.

FIG. 2 is a diagram showing an example of score information as an analysis result.

FIG. 3 is a diagram showing an example of singing voice information.

FIG. 4 is a block diagram illustrating a configuration example of a singing voice generation unit.

FIG. 5 is a diagram showing an example of singing pattern data.

FIG. 6 is a diagram showing an example of singing voice information before applying a singing style.

[FIG. 7] FIG. 7 shows a singing voice after the singing style “Enka” is applied to the singing voice information of FIG. It is a figure showing information.

FIG. 8 is a block diagram showing a main part of another configuration example of the singing voice synthesizing apparatus according to the present embodiment.

FIG. 9 is a flowchart illustrating the operation of the singing voice synthesizing apparatus according to the present embodiment.

FIG. 10 is a perspective view showing an external configuration of a robot device according to the present embodiment.

FIG. 11 is a diagram schematically showing a degree of freedom configuration model of the robot device.

FIG. 12 is a block diagram showing a system configuration of the robot device.

BEST MODE FOR CARRYING OUT THE INVENTION

Hereinafter, specific embodiments of the present invention will be described in detail with reference to the drawings.

In the embodiment of the present invention, an example of a singing voice synthesizing apparatus that mainly synthesizes a singing voice and further has a tone synthesizing function that also has a function of synthesizing a musical tone is shown. Of course, the present invention can be easily applied to a singing voice synthesizing device for synthesizing only singing voices, a tone synthesizing device for synthesizing musical tones, or a signal synthesizing device for synthesizing audio signals such as singing voices and musical tones.

FIG. 1 is a block diagram showing a schematic system configuration of a singing voice synthesizing apparatus with a musical sound synthesizing function according to the present embodiment. The singing voice synthesizing device shown in FIG. 1 is assumed to be applied to, for example, a robot device having at least an emotion model, a voice synthesizing unit, and a sound generating unit, but is not limited thereto. Of course, it is possible to apply to various computer AI (artificial intelligence).

[0033] In FIG. 1, a performance data analysis unit analyzes performance data 1 represented by MIDI data.

2 analyzes the input performance data 1 and converts it into musical score information 4 representing the pitch, length and intensity of the tracks and channels in the performance data.

FIG. 2 shows an example of performance data (MIDI data) converted into musical score information 4. In FIG. 2, events are written for each track and each channel. Events include note events and control events. When a note event occurs, the time of the event is IJ (time column in the figure), height, It has information on length and strength. Therefore, a note sequence or a sound sequence is defined by a sequence of note events. The control event has a time of occurrence, control type data (for example, vibrato, performance dynamics expression) and data indicating the control port. For example, in the case of vibrato, the control contents include `` depth '' indicating the magnitude of the sound swing, `` width '' indicating the cycle of the sound swing, and the start timing of the sound swing (delay from the sounding timing). Time). Control events for a specific track or channel are applied to the playback of the note sequence of that track or channel, unless a new control event (control change) occurs for that control type. In addition, lyrics can be entered for each track in the performance data of a MIDI file. In FIG. 2, “Uruhi” shown at the top is a part of the lyrics written on track 1, and “Uruhi” shown at the bottom is a part of the lyrics written on track 2. That is, the example of FIG. 2 is an example in which lyrics are embedded in the analyzed music information (music score information).

In FIG. 2, time is represented by “measures: beats: number of ticks”, length is represented by “number of ticks”, strength is represented by numerical values of “0-127”, and height is represented by 440Hz is represented by "A4". For the vibrato, the depth, width, and delay are each expressed as a number from "0-64-127".

Returning to FIG. 1, the converted score information 4 is passed to the lyrics providing unit 5. The lyric imparting unit 5 generates singing voice information 6 to which the lyrics for the sound are attached along with information such as the length, pitch, intensity, and expression of the sound corresponding to the note, based on the musical score information 4.

FIG. 3 shows an example of the singing voice information 6. In FIG. 3, “\ song \” is a tag indicating the start of lyrics information. The tag “¥ PP, T10673075 ¥” indicates a break of 10673075 x sec, the tag “¥ tdyna 110 649075 ¥” indicates the head force, the overall strength of 10673075 μsec, and the tag “¥ fine—100 ¥” The tags "\ vibrato NRPN_dep = 64 \ j" and "\ vibrato NRPN_del = 50 \" and "\ vibrat o NRPN_rat = 64 \" indicate the fine adjustment of the height equivalent to MIDI fine tune, respectively. , Delay, width. Also, the tag “¥ dyna 100 ¥” indicates the strength of each sound, and the tag “¥ G4, T288461 ¥ あ” indicates the height of G4 and the length of 288461 μsec lyrics “A”. The singing voice information in Fig. 3 is the score information shown in Fig. 2. (Analysis result of MIDI data).

As can be seen from a comparison between FIG. 2 and FIG. 3, performance data (for example, note information) for musical instrument control is sufficiently utilized in generating singing voice information. For example, for the component “a” of the lyrics “aruhi”, the score information (Fig. 2) for the occurrence time, length, height, strength, etc. of the sound of “a” that is a singing attribute other than “a” ), The time of occurrence, length, height, strength, etc. included in the control information and note event information are directly used, and the next lyric element `` ru '' is also used in the same track and channel in the score information. The next note event information is used directly, and so on.

Returning to FIG. 1, the singing voice information 6 is passed to the singing voice generating unit 7, and the singing voice information is

A singing voice waveform 8 is generated based on 6. Here, the singing voice generator 7 that generates the singing voice waveform 8 from the singing voice information 6 is configured as shown in FIG. 4, for example.

In FIG. 4, the singing voice prosody generation unit 7-1 converts the singing voice information 6 into singing voice prosody data. The waveform generator 7-2 converts the singing voice prosody data into a singing voice waveform 8.

As a specific example, a case where the lyrics element “ra” having the height of “A4” is extended for a certain period of time will be described. The singing voice prosody data without vibrato is shown in the following table.

[0042] Table 1

In this table, [LABEL] indicates the duration of each phoneme. In other words, the phoneme “ra” (phoneme segment) has a duration of 1000 samples from 0 to 1000 samples, and the first phoneme “aa” following “ra” has a power of 1000 samples 39 600 samples Up to 38,600 samples in duration. [PITCH] is a pitch cycle represented by a point pitch. That is, the pitch period at the 0 sample point is 50 samples. In this case, the pitch of 50 samples is applied to all samples because the height of the “ra” is not changed. [VOLUME] indicates the relative volume at each sample point. That is, when the default value is 100%, the volume is 66% at the 0 sample point and 57% at the 39600 sample point. Similarly, at the 40100 sample point, 48% of the volume continues, and at the 42600 sample point, the volume becomes 3%. You. This realizes that the sound of “La” attenuates with the passage of time.

On the other hand, when vibrato is applied, for example, singing voice prosody data as described below is created.

[0045] Table 2

[UBEU [PITCH] [VOLUME]

0 η 50 n 66

i ■ n wnwnw αα innn 57

11000 2000 53 40 100 AO

21000 47 40 600 39

31000 6009 53 41 inn 30

39600 8010 47 41600 21

40 100 100 10 53 42 100 12

40 600 12011 47 42 600 3

41100 αα "1401 1 53

41600 16022 47

42100 18022 53

42600 20031 47

43100 22031 53

24042 47

26042 53

28045 47

30045 53

32051 47

34051 53

36062 47

38062 53

40074 47

42074 53

43100 50

As shown in the [PITCH] section of The pitch period is the same for 50 samples, and the pitch of the voice remains unchanged during this period, but thereafter, the pitch period for 53 samples at 2000 sample points, the pitch period for 47 samples at 4009 sample points, and the sample period at 6009 samples points The pitch cycle fluctuates up and down (50 ± 3) with a cycle (width) of about 4000 sample lengths, such as 53 pitch cycles. This implements vibrato, which is a fluctuation in the pitch of the voice. The data in this [PITCH] column is information on the corresponding singing voice element (for example, "ra") in singing voice information 6, especially note number (for example, A4) and vibrato control data (for example, tag "\ vibrato NRPN_dep = 64 \"). , [\ Vibra to NRPN_del = 50 \], and "\ vibrato NRPN_rat = 64 \").

The waveform generator 7-2 reads out a sample from an internal waveform memory (not shown) based on such singing voice / phonological data and generates a singing voice waveform 8. The singing voice generator 7 that generates the singing voice waveform 8 from the singing voice information 6 is not limited to the above example, and any appropriate known singing voice generator can be used.

Returning to FIG. 1, the performance data 1 is passed to the MIDI sound source 9, and the MIDI sound source 9 generates a musical tone based on the performance data. This musical tone has an accompaniment waveform 10.

[0049] The singing voice waveform 8 and the accompaniment waveform 10 are both passed to a mixing unit 11 that performs synchronization and mixing.

[0050] The mixing section 11 synchronizes the singing voice waveform 8 and the accompaniment waveform 10 and superimposes them on each other and reproduces them as the output waveform 3, so that music reproduction using the singing voice accompanied by the accompaniment based on the performance data 1. I do.

[0051] At the stage of converting the musical score information 4 into the singing voice information 6 by the lyrics providing unit 5, if the musical score information 4 includes the lyrics information, the singing voice information is prioritized by the lyrics present as the information. 6 is given. As described above, FIG. 2 shows an example of the musical score information 4 to which lyrics are added, and FIG. 3 shows an example of the singing voice information 6 generated from the musical score information 4 of FIG.

Here, when the singing style is specified by the operator when generating the singing voice, when the musical score information 4 is converted into the singing voice information 6, the music information described in the musical score information 4 is Is passed to the singing pattern changing unit 12.

In the singing pattern changing unit 12, the singing pattern data 13 (singing style data and Also say. ) Is compared with the musical score information 4 to refer to the singing pattern data 13 matching the specified singing style, and singing is performed for the sound (note) of the musical score information 4 matching the conditions described therein. The singing voice information 6 is generated by adding the parameters of the singing pattern described in the pattern data 13. More specifically, for a predetermined note (note) in a musical note sequence in a musical score information, a volume change such as vibrato, expression, timing, pitch bend, a pitch change, and a timing change are included. The parameters for giving the expression change are set, and these parameters are stored in the storage means as singing pattern data 13 (singing style data). The singing pattern changing unit 12 stores the score information 4 and the singing pattern data. Using the data 13, the singing voice information 6 modified according to the singing style is generated.

FIG. 5 is a diagram showing a specific example of singing pattern data 13 (singing style data) corresponding to each singing style. In the example of FIG. 5, the singing pattern data 13 is divided into two parts, a condition part and an execution part. The items of the condition part include singing styles such as “popular”, “classic”, and “enka”, and The pitch, length, strength, strength increase / decrease pattern, tempo of the music, etc., which are the conditions for selecting the sound (note) to be given the expression change, are included in the execution unit. , Vibrato as parameters of the expression change to be applied to the sound (note) conforming to the condition described in the condition section, expression (expression: dynamics of sound, performance dynamics expression), timing, pitch bend (phrase Head, pitch end of phrase), pitch adjustment, etc. are included.

As the vibrato of the execution unit, parameters for delay until the vibrato is applied, cycle, amplitude, increase / decrease of the cycle, and increase / decrease of the amplitude are specified. In the expression, the volume parameter is specified at some points that are characteristic points, such as the beginning, end, and a large change point when the time from the beginning to the end of the sound is 100. For the timing, a parameter indicating the degree of delay or advance relative to the beat is specified. Pitch bend is the degree to which the pitch is raised or lowered when the pitch is raised or lowered for the sound at the beginning or end of the phrase. A parameter expressed in cents is specified. It does not apply to sounds in phrases. For pitch adjustment, the parameter of the number of cents when raising or lowering the entire pitch is specified. Where cent is 100 It is a unit of pitch width that represents a semitone in cents.

FIGS. 6 and 7 show examples of application of this singing style (examples of giving singing pattern data parameters). Fig. 6 shows the singing voice information before the application of the singing style.For the part ptA surrounded by the broken line in Fig. 6, for example, the singing voice information after each parameter of the singing pattern data of the singing style of "enka" is applied. This is indicated by ptB enclosed by the broken line in FIG. In FIGS. 6 and 7, for example, as shown in FIG. 7, for the lyric “hi” (note) “E4, T144231” of the lyrics of the singing voice information in FIG. Expression change by parameters such as pitch bend, end-of-phrase pitch bend, and change of expression is added, and the singing voice information of the singing style of "enka" is changed.

The change of the singing voice information according to the singing style is realized by the singing pattern changing unit 12 of FIG. 1 using the musical score information 4 and the singing pattern data 13. As shown in FIG. 8, the singing voice information 6Α (before the singing style is applied) from the lyric providing unit 5 is sent to the singing pattern changing unit 12, and the singing pattern changing unit 12 outputs the singing voice information 6A before application. Of the sounds (notes) that match the conditions of the singing pattern data (singing style data) in Fig. 5 above, parameters are changed according to the singing pattern, and the singing style applied singing voice information 6B is output. It may be configured to send to the singing voice generation unit 7. The other configuration is the same as that of FIG. 1 described above, and is not shown and will not be described.

[0058] The singing style can be instructed by the operator in advance as described above. The singing style is stored in MIDI data, and is defined by a general song name, track name, The singing pattern changing unit 12 can also make a determination based on the attached information such as a marker. For example, the name of the song or track is annotated with the style name itself, including the style name, or the style of the song or track name can be estimated. There is a case that has been done.

Here, in the above example, the case of singing voice has been mainly described, but the same applies to the case of musical sound.

Styles (playing styles) can be applied to In this method, for example, the performance pattern of musical sounds such as saxophone and violin is changed in accordance with the specified performance style. Specifically, for a desired musical tone (musical sound of saxophone, violin, etc.) in the musical score information, for example, the performance pattern data shown in FIG. The playing pattern data has a condition part and an execution part, as in FIG. 5 described above, and singing styles such as “popular”, “classic” and “enka” and expression changes are given to the items of the condition part. The pitch, length, strength, tempo of the song, etc., which are the conditions for selecting the target sound (note), are included in the execution unit, and the sound (note) that matches the condition of the condition unit is included in the execution unit. To include parameters such as vibrato, expression, timing, pitch bend (pitch start and phrase end bend), pitch adjustment, etc., as parameters of the expression change to be applied, .

In the example of FIG. 1, information of a desired musical sound (musical sound of saxophone, violin, etc.) of musical score information 4 is shown.

(For example, note string information) is sent to the performance pattern changing unit 15 and, from the performance pattern data 16 as described above, for a sound (note) that satisfies a predetermined condition according to the specified performance style. By adding parameters such as vibrato, expression, timing, pitch bend (pitch bend at the beginning and end of phrase), pitch adjustment, etc., it is possible to obtain performance data 14 to which a performance style is applied. ing. The performance data 14 to which the performance style is applied is sent to the MIDI sound source 9, and the MIDI sound source 9 generates a musical tone to which the performance style is applied based on the performance data.

Next, FIG. 9 is a flowchart for explaining the overall operation of the singing voice synthesizing apparatus shown in FIG. 1 (or partially shown in FIG. 8).

In FIG. 9, first, performance data 1 of a MIDI file is input (step Sl). Next, the performance data 1 is analyzed, and the score data 4 is created (steps S2, S3). Next, if necessary, the operator is inquired, and the operator's setting processing, such as selection of a performance style, selection of lyrics, selection of a track or channel to be lyrics, selection of a MIDI track to be muted, selection of a channel, etc. Do. In addition, the force and the portion that the operator does not set can be selected based on the attached information such as the song name, track name, marker, etc. of the performance data 1, or the predetermined default information can be used in the subsequent processing. I have.

In the following step S5, singing voice information 6 is created from the lyrics using the musical score information 4 of the channel in the track to which the lyrics are assigned. Next, check whether all tracks have been processed (Step S6) .If not, proceed to the next track and go to Step S5. [0064] Therefore, when lyrics are added to a plurality of tracks, the lyrics are added independently of each other and singing voice information 6 is created.

Next, in step S7, it is determined whether or not a change in the singing style (or performance style) has been designated. If Yes (the style has been changed), the process proceeds to step S8, and No (no change) In the case of, the process proceeds to step S11.

In step S8, it is determined whether or not the sound (note) of the musical score information satisfies the condition indicated in the condition section of the singing pattern data 13 (or the performance pattern data 16). In step S9, for the sound (note) that conforms to the above, the parameters for the expression change indicated in the execution section of the singing pattern data 13 (or the performance pattern data 16) are applied, and the singing voice data (or Performance data).

In the next step S10, it is determined whether or not the condition check has been completed for all notes (notes). If No, the process returns to step S8. If Yes, the process proceeds to the next step S11. .

In step S 11, the singing voice generator 8 generates a singing voice waveform 8 from the singing voice information 6. In the next step S12, MIDI is reproduced by the MIDI sound source 9 to create an accompaniment waveform 10.

The singing voice waveform 8 and the accompaniment waveform 10 were obtained by the processing so far. Therefore, the singing voice waveform 8 and the accompaniment waveform 10 are synchronized by the mixing unit 11, and are superimposed and reproduced as the output waveform 3 (steps S13 and S14). This output waveform 3 is output as an acoustic signal via a sound system (not shown).

In summary of the embodiment of the present invention described above, an expression change including at least one of a volume change, a pitch change, and a timing change is given to a note of music information according to a singing style. Pattern data in which parameters are set in advance, the input performance data is analyzed as pitch, length, and lyrics music information, and based on the analyzed lyrics information of the music information. Lyrics are added to the note sequence to produce singing voice information, and the volume change, pitch change, and timing change based on the pattern data prepared in advance corresponding to the notes in the note sequence of the analyzed music information. The singing pattern of the singing information is changed by giving an expression change including at least one, and a singing voice is generated based on the musical note sequence of the pattern-changed music information.

According to such an embodiment of the present invention, the singing style of the singing voice is An appropriate change in expression can be given, and the musical expression can be expanded. Also, while the conventional singing style was limited to the ability to sing only with poor expression, the singing style was arbitrarily selected to improve the expressiveness and to match the music. A singing style can achieve a natural singing voice, and a mismatched style can express humor, which can further enhance entertainment.

[0072] Further, the performance style can be applied not only to the singing voice but also to the musical tone. In this case, the performance data is analyzed as the musical information of the pitch, length, and lyrics, and analyzed. Changing the singing or performance pattern by giving expression changes including at least one of volume change, pitch change, and timing change to the notes in the note sequence of the music information according to the singing or performance style, It is preferable to generate a singing voice or a musical tone based on the note sequence of the music information whose pattern has been changed. This makes it possible to change the expression according to the style of the singing or performance to the singing voice at the time of singing or the musical tone at the time of performance, thereby significantly improving the musical expression.

The singing voice synthesis function described above is mounted on, for example, a robot device.

Hereinafter, a bipedal walking type robot device shown as an example of a configuration is a practical robot that supports human activities in various situations in a living environment and other everyday life, and has an internal state (anger, sadness, It is an entertainment robot that can act according to joy, pleasure, etc., and can display basic actions performed by humans.

As shown in FIG. 10, the robot device 60 includes a head unit 63 connected to a predetermined position of the trunk unit 62, two left and right arm units 64R / L, and two left and right legs. The subunit 65R / L is concatenated (however, each of R and L is a suffix indicating each of right and left. The same applies hereinafter).

FIG. 11 schematically shows the configuration of the degrees of freedom of the joints provided in the robot apparatus 1. The neck joint supporting the head unit 63 has three degrees of freedom: a neck joint axis 101, a neck pitch axis 102, and a neck joint roll axis 103.

[0077] Further, each arm unit 64RZL constituting the upper limb includes a shoulder joint pitch axis 107, a shoulder joint roll axis 108, an upper arm joint axis 109, an elbow joint pitch axis 110, and a forearm joint axis 111. A wrist joint pitch axis 112, a wrist joint roll axis 113, and a hand 114. Hand 11 4 is actually a multi-joint / multi-degree-of-freedom structure including a plurality of fingers. However, the movement of the hand 114 has little contribution or influence to the posture control and the walking control of the robot device 60, and therefore, it is assumed herein that the degree of freedom is zero. Therefore, each arm has seven degrees of freedom.

The trunk unit 62 has three degrees of freedom: a trunk pitch axis 104, a trunk roll axis 105, and a trunk axis 106.

Further, each leg unit 65RZL constituting the lower limb has a hip joint axis 115, a hip joint pitch axis 116, a hip joint roll axis 117, a knee joint pitch axis 118, an ankle joint pitch axis 119, An ankle joint roll shaft 120 and a foot 121 are provided. In the present specification, the intersection of the hip joint pitch axis 116 and the hip joint roll axis 117 defines the hip joint position of the robot device 1. The foot 121 of the human body is actually a structure including a multi-joint * multi-degree-of-freedom sole, but the sole of the robot device 60 has zero degrees of freedom. Therefore, each leg is configured with six degrees of freedom.

In summary, the robot device 60 as a whole has a total of 3 + 7 × 2 + 3 + 6 × 2 = 32 degrees of freedom. However, the robot 1 for entertainment is not necessarily limited to 32 degrees of freedom. It goes without saying that the degree of freedom, that is, the number of joints, can be appropriately increased or decreased according to the constraints on design and production and the required specifications.

[0081] Each degree of freedom of the robot device 60 as described above is actually implemented using an actuator. Actuator units must be small and lightweight due to requirements such as eliminating extra bulges in appearance and approximating the human body shape, and controlling the posture of unstable structures such as bipedal walking. Is preferred. Further, it is more preferable that the actuator is constituted by a small AC servo actuator of a type directly connected to a gear and a type in which a servo control system is integrated into a motor unit and mounted in a motor unit.

FIG. 12 schematically shows a control system configuration of the robot device 60. As shown in FIG. 12, the control system includes a thought control module 200 that dynamically responds to a user input or the like to determine emotions and expresses emotions, and a movement that controls the whole body cooperative movement of the robot apparatus 1 such as driving an actuator 350. The control module 300 is included.

[0083] Thought control module 200 executes arithmetic processing relating to emotion determination and emotion expression. U (Central Processing Unit) 211, RAM (Random Access Memory) 212, ROM (Read Only Memory) 213, and external storage device (hard disk drive, etc.) 214 This is an independently driven information processing device that can perform processing.

[0084] The thinking control module 200 receives the current emotion of the robot device 60 in accordance with an external stimulus such as image data input from the image input device 251 or voice data input from the voice input device 252. And make decisions. Here, the image input device 251 includes, for example, a plurality of charge coupled device (CCD) cameras, and the audio input device 252 includes, for example, a plurality of microphones.

[0085] Further, thinking control module 200 issues a command to movement control module 300 to execute a motion or action sequence based on a decision, that is, a movement of a limb.

[0086] One motion control module 300 includes a CPU 311 for controlling the whole body cooperative motion of the robot device 60, a RAM 312, a ROM 313, and an external storage device (such as a hard disk drive) 314. It is an independently driven information processing device that can perform self-contained processing. In the external storage device 314, for example, a walking pattern calculated offline, a target ZMP trajectory, and other action plans can be stored. Here, the ZMP is a point on the floor at which the moment due to the floor reaction force during walking becomes zero, and the ZMP trajectory is, for example, a trajectory where the ZMP moves during the walking operation of the robot device 1. Means The concept of ZMP and the application of ZMP to the stability discrimination standard of walking robots are described in "Legged Locomotion Robots" by Miomir Vukobratovic (Ichiro Kato, "Walking Robots and Artificial Feet" (Nikkan Kogyo Shimbun)

Company)).

[0087] The motion control module 300 includes an actuator 350 for realizing the degree of freedom of each joint distributed over the whole body of the robot device 60 shown in Fig. 11, and a posture sensor 351 for measuring the posture and inclination of the trunk unit 2. Various devices such as a grounding confirmation sensor 3 52, 353 that detects leaving or landing on the left and right soles and a power supply control device 354 that manages the power supply such as a battery are connected via the bus interface (I / F) 301. It is connected. Here, the attitude sensor 351 is For example, it is constituted by a combination of an acceleration sensor and a gyro 'sensor, and the ground confirmation sensors 352 and 353 are constituted by a proximity sensor or a micro' switch.

[0088] The thinking control module 200 and the motion control module 300 are constructed on a common platform, and are interconnected via bus interfaces 201 and 301.

The exercise control module 300 controls the whole-body cooperative exercise by each actuator 350 that embodies the action specified by the thought control module 200. That is, the CPU 311 extracts an operation pattern corresponding to the action instructed from the thought control module 200 from the external storage device 314, or internally generates an operation pattern. Then, the CPU 311 sets a foot motion, a ZMP trajectory, a trunk motion, an upper limb motion, a waist horizontal position and a height, etc. in accordance with the specified motion pattern, and issues a command for instructing the motion in accordance with these settings. Transfer the value to each actuator 350.

The CPU 311 detects the posture and inclination of the trunk unit 62 of the robot device 60 based on the output signal of the posture sensor 351, and also detects each leg unit based on the output signals of the grounding confirmation sensors 352 and 353. By detecting whether the 65R / L is in the free leg state or in the standing state, the whole body cooperative movement of the robot device 60 can be adaptively controlled.

[0091] The CPU 311 controls the posture and operation of the robot device 60 so that the ZMP position always faces the center of the ZMP stable region.

[0092] Further, the motion control module 300 returns to the thought control module 200 the force to which the action according to the intention determined in the thought control module 200 is expressed, that is, the processing state. RU

[0093] In this manner, the robot device 60 can determine its own and surrounding conditions based on the control program, and can act autonomously.

In the robot device 60, a program (including data) implementing the above-described singing voice synthesizing function is stored in, for example, the ROM 213 of the thinking control module 200. In this case, the execution of the singing voice synthesis program is performed by the CPU 211 of the thinking control module 200.

[0095] By incorporating the singing voice synthesis function into such a robot apparatus, The expression ability as a singing robot is newly acquired, entertainment is expanded, and intimacy with humans is deepened.

[0096] Note that the present invention is not limited to only the above-described embodiment, and it is needless to say that various changes can be made without departing from the spirit of the present invention.

[0097] For example, the singing voice generating unit 7 corresponding to the singing voice synthesizing unit and the waveform generating unit used in the voice synthesizing method and apparatus described in the specification and drawings of Japanese Patent Application No. 2002-73385 previously proposed by the present applicant. Although singing voice information that can be used is illustrated, various other singing voice generating units can be used. In this case, singing voice information including information required for singing voice generation by various singing voice generating units is used. Needless to say, the performance data may be generated from the performance data. The performance data is not limited to MIDI data, and performance data of various standards can be used.

[0098] The present invention is not limited to the above-described embodiment described with reference to the drawings. The appended claims and various modifications, substitutions, or the like without departing from the spirit of the invention are set forth. It will be clear to those skilled in the art that things can be done.

Claims

The scope of the claims

[1] 1. Analyzing the performance data as pitch, length and lyrics music information, and analyzing the notes of the analyzed music information in the note sequence according to the style of singing or performance. A pattern changing step of changing a singing or performance pattern by giving an expression change including at least one of a volume change, a pitch change, and a timing change; and a singing voice or a musical tone based on the musical note sequence of the pattern-changed music information. And a generating step of generating the signal.

[2] 2. The signal synthesizing method according to claim 1, wherein the performance data is performance data of a MIDI file.

[3] 3. The parameters for giving the above-mentioned expression change are the singing or playing style and at least one of the note length, strength, strength increase / decrease state, pitch, and music speed. 2. The signal synthesizing method according to claim 1, wherein the signal synthesizing method is set accordingly.

[4] 4. Analysis means for analyzing performance data as musical information of pitch, length, lyrics, and, according to the style of singing or performance, change of volume, pitch, Storage means for storing pattern data in which a parameter for giving an expression change including at least one of the timing changes is set, and a note corresponding to a note of a note string of music information analyzed by the analysis means. Pattern changing means for changing a singing or performance pattern by giving an expression change including at least one of a volume change, a pitch change, and a timing change read by the storage means; Generating means for generating a singing voice or a musical tone based on a note sequence of information.

[5] 5. The signal synthesizer according to claim 4, wherein the performance data is performance data of a MIDI file.

[6] 6. The parameters for giving the expression change are the singing or playing style and at least one of the note length, strength, strength increase / decrease state, pitch, and music speed. 5. The signal synthesizing device according to claim 4, wherein the signal synthesizing device is set in accordance with the setting.

[7] 7. An analysis step of analyzing the performance data as musical information of the pitch, length and lyrics, and adding lyric to the note sequence based on the analyzed lyric information of the musical information and singing voice information You Lyric providing step, and giving an expression change including at least one of a volume change, a pitch change, and a timing change to the notes in the note sequence of the analyzed music information according to a singing or performance style. A singing voice synthesizing method, comprising: a pattern changing step of changing a singing pattern of the singing voice information, and a singing voice generating step of generating a singing voice based on the changed singing voice information.

[8] 8. The singing voice synthesizing method according to claim 7, wherein the performance data is performance data of a MIDI file.

[9] 9. The parameter for giving the expression change is set according to the singing style and at least one of the note length, strength, height, and music speed. 8. The singing voice synthesizing method according to claim 7, wherein

10. The singing voice synthesizing method according to claim 7, wherein the expression change is performed by adding at least one of vibrato, pitch bend, and expression to a target note sound.

[11] 11. The parameter for adding the vibrato is at least one of the following: amplitude delay information, amplitude information, cycle information, amplitude increase / decrease information, and cycle increase / decrease information. And the parameters for giving the expression include at least one of the time information of the ratio to the length of the note and the strength information at any characteristic point on the time axis. 11. The singing voice synthesizing method according to claim 10, wherein:

12. The singing voice synthesizing method according to claim 7, wherein the singing style is selected by one of a user setting, a track name of performance data, a music title, and a marker.

[13] 13. Accumulates pattern data that sets parameters for giving expression changes including at least one of volume change, pitch change, and timing change to musical information notes in accordance with the singing style. Means for analyzing the performance data as pitch, length and lyrics music information, and singing voice by adding lyrics to the note sequence based on the analyzed music information lyrics information. An expression including at least one of a volume change, a pitch change, and a timing change read out by the storage means, corresponding to a note in the note sequence of the music information analyzed by the analysis means; Pattern changing means for changing the singing pattern of the singing voice information by giving a change; And a singing voice generating means for generating a singing voice based on the note sequence of the music information obtained.

14. The singing voice synthesizer according to claim 13, wherein the performance data is performance data of a MIDI file.

[15] 15. The parameters for giving the expression change are set according to the singing style and at least one of the note length, strength, strength increase / decrease state, height, and music speed. 14. The singing voice synthesizing device according to claim 13, wherein the singing voice synthesizing device is used.

16. The singing voice synthesizing device according to claim 13, wherein the expression change is performed by adding at least one of vibrato, pitch bend, and expression to a target note sound.

[17] 17. The parameter for adding the vibrato is at least one of information on delay of amplitude start, information on amplitude, information on cycle, information on increase / decrease in amplitude, and information on increase / decrease in cycle. And the parameters for giving the expression include at least one of the time information of the ratio to the length of the note and the strength information at any characteristic point on the time axis. 17. The singing voice synthesizing device according to claim 16, wherein:

18. The singing voice synthesizing apparatus according to claim 13, wherein the singing style is selected by any one of a user setting, a track name of performance data, a music name, and a marker.

[19] 19. A program for causing a computer to execute a predetermined process, comprising: an analysis step of analyzing input performance data as musical information of pitch, length, and lyrics; A lyric imparting step of assigning lyrics to the note sequence based on the lyric information of the information to produce singing voice information; and a singing or performing style for the notes of the analyzed musical information note sequence A pattern changing step of changing the singing pattern of the singing voice information by giving an expression change including at least one of a volume change, a pitch change, and a timing change; and a singing voice generating generating a singing voice based on the changed singing voice information. A program that specializes in having a process and.

[20] 20. A computer-readable recording medium on which a program for causing a computer to execute a predetermined process is recorded. An analysis step of analyzing the music information of the music information; a lyrics assignment step of assigning lyrics to the note string based on the analyzed lyrics information of the music information to obtain singing voice information; and a note of the note string of the analyzed music information. A pattern change step of changing the singing pattern of the singing voice information by giving an expression change including at least one of a volume change, a pitch change, and a timing change in accordance with a singing or performance style. And a singing voice generating step of generating a singing voice based on the obtained singing voice information.

[21] 21. An autonomous robot device that operates based on the supplied input information, and performs volume change, pitch change, and timing change for musical information notes according to the singing style. Storage means for storing pattern data in which parameters for giving expression changes including at least one are set, analysis means for analyzing performance data as musical information of pitch, length, lyrics, and analysis Lyrics adding means for assigning lyrics to a note sequence based on the lyric information of the analyzed music information to produce singing voice information, and the storage corresponding to the notes of the note sequence of the music information analyzed by the analyzing means. Pattern changing means for changing the singing pattern of the singing voice information by giving an expression change including at least one of a volume change, a pitch change, and a timing change read by the means; Robot device according to feature that it has a voice generating means for generating a singing voice based on the sequence of notes of emissions changed music information.