WO2021060273A1 - Sound output control method and sound output control device - Google Patents

Sound output control method and sound output control device Download PDF

Info

Publication number
WO2021060273A1
WO2021060273A1 PCT/JP2020/035785 JP2020035785W WO2021060273A1 WO 2021060273 A1 WO2021060273 A1 WO 2021060273A1 JP 2020035785 W JP2020035785 W JP 2020035785W WO 2021060273 A1 WO2021060273 A1 WO 2021060273A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
hand
phoneme
specific
pronunciation
Prior art date
Application number
PCT/JP2020/035785
Other languages
French (fr)
Japanese (ja)
Inventor
入山 達也
慶二郎 才野
Original Assignee
ヤマハ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ヤマハ株式会社 filed Critical ヤマハ株式会社
Publication of WO2021060273A1 publication Critical patent/WO2021060273A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/18Selecting circuits
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers

Definitions

  • This disclosure relates to a technique for controlling pronunciation.
  • pronunciation is started when the user contacts.
  • the purpose is to start pronunciation before an object such as a user's finger comes into contact with a surface such as a key.
  • the sound control method detects that the object is in a specific state while the object is moving toward the surface, and the specific state is detected.
  • the state is detected, the first sound is sounded, the striking event in which the object hits the surface due to the movement of the object is detected, and the second sound is sounded when the striking event is detected.
  • the object is in a specific state while the object is moving toward the surface, and the object hits the surface due to the movement of the object. It includes a detection unit that detects a striking event, and a sound control unit that sounds a first sound when the specific state is detected and a second sound when the striking event is detected.
  • FIG. 1 is a block diagram illustrating the configuration of the pronunciation control system 100 according to the embodiment of the present disclosure.
  • the pronunciation control system 100 synthesizes a virtual voice in which a specific singer sings a musical piece. Each phoneme that composes the synthesized voice is pronounced at the time instructed by the user.
  • the pronunciation control system 100 includes an operation unit 10 and a pronunciation control device 20.
  • the user instructs the pronunciation control device 20 at a time when the operation unit 10 is hit with his / her own hand H to start the pronunciation of each phoneme (hereinafter referred to as “pronunciation start point”).
  • the pronunciation control device 20 synthesizes voice by pronouncing each phoneme according to an instruction from the user.
  • the operation unit 10 includes an operation reception unit 11, a first sensor 13, and a second sensor 15.
  • the operation reception unit 11 includes a surface (hereinafter referred to as “striking surface”) F that is hit by the user's hand H.
  • the hand H is an example of an "object” that hits the striking surface F.
  • the operation receiving unit 11 includes a housing 112 and a light transmitting unit 114.
  • the housing 112 is, for example, a hollow structure having an opening at the top.
  • the light transmitting portion 114 is a flat plate-shaped member formed of a member that transmits light in a wavelength range that can be detected by the first sensor 13.
  • the light transmitting portion 114 is installed so as to close the opening of the housing 112.
  • the surface of the light transmitting portion 114 on the side opposite to the internal space of the housing 112 corresponds to the striking surface F.
  • the user hits the striking surface F with the hand H in order to indicate the pronunciation start point of each phoneme. Specifically, the user hits the hitting surface F by moving the hand H from above the hitting surface F toward the hitting surface F. A phoneme is pronounced according to the time when the hand H hits the striking surface F.
  • the first sensor 13 and the second sensor 15 are housed inside the housing 112.
  • the first sensor 13 is a sensor for detecting the state of the user's hand H.
  • a distance image sensor that measures the distance between the subject and the imaging surface for each pixel is used as the first sensor 13.
  • the hand H moving toward the striking surface F is imaged by the first sensor 13.
  • the first sensor 13 is installed, for example, in the central portion of the bottom surface of the housing 112, and images the hand H moving toward the striking surface F from the palm side (inside of the housing 112).
  • the first sensor 13 can detect light in a specific wavelength range, and receives light coming from the hand H located above the striking surface F via the light transmitting portion 114 to receive the light.
  • Image data D1 Data representing the image of H (hereinafter referred to as "image data") D1 is generated.
  • the light transmitting portion 114 is formed of a member that transmits light that can be detected by the first sensor 13.
  • the image data D1 is transmitted to the sound control device 20.
  • the first sensor 13 and the sound control device 20 can communicate with each other wirelessly or by wire.
  • the image data D1 is repeatedly generated at predetermined intervals.
  • the second sensor 15 is a sensor for detecting the impact of the hand H on the impact surface F.
  • a sound collecting device that collects ambient sounds and generates a sound signal D2 representing the collected sounds is used as the second sensor 15.
  • the second sensor 15 collects the hitting sound generated when the user's hand H hits the hitting surface F.
  • the sound signal D2 is transmitted to the sound control device 20.
  • the second sensor 15 and the sound control device 20 can communicate with each other wirelessly or by wire.
  • FIG. 2 is a block diagram illustrating the configuration of the sound control device 20.
  • the sound generation control device 20 synthesizes voice according to the action of hitting the hitting surface F by the user.
  • the sound control device 20 includes a control device 21, a storage device 23, and a sound emitting device 25.
  • the control device 21 is, for example, a single or a plurality of processors that control each element of the sound control device 20.
  • the control device 21 is a CPU (Central Processing Unit), SPU (Sound Processing Unit), GPU (Graphics Processing Unit), DSP (Digital Signal Processor), FPGA (Field Programmable Gate Array), or ASIC (Application Specific Integrated Circuit). ) Etc., it is composed of one or more types of processors.
  • the control device 21 executes a program stored in the storage device 23 to generate a plurality of signals (hereinafter referred to as “synthetic signals”) V representing the voices of the singer singing the music.
  • synthetic signals hereinafter referred to as “synthetic signals” V representing the voices of the singer singing the music.
  • the storage device 23 is a single or a plurality of memories composed of a known recording medium such as a magnetic recording medium or a semiconductor recording medium.
  • the storage device 23 stores a program executed by the control device 21 and various data used by the control device 21.
  • the storage device 23 may be configured by combining a plurality of types of recording media.
  • the storage device 23 may be a portable recording medium that can be attached to and detached from the sound control device 20, or an external recording medium (for example, online storage) that the sound control device 20 can communicate with via a communication network.
  • the storage device 23 stores data (hereinafter referred to as “synthetic data”) S representing sounds to be synthesized by the sound control device 20.
  • Synthetic data S is data that specifies the content of the music.
  • the synthetic data S is data for designating the pitch Sx and the phoneme Sy for each of the plurality of notes constituting the music.
  • the pitch Sx is any one of a plurality of pitches (for example, a note number).
  • Phonological S is a pronunciation content that should be uttered together with the pronunciation of notes.
  • the phoneme Sy corresponds to one syllable (pronunciation unit) constituting the lyrics of the music.
  • a typical phoneme Sy in Japanese is a combination of a consonant and a vowel immediately after it, or a single vowel.
  • the synthetic signal V is generated by voice synthesis using the synthetic data S.
  • the sounding start point of each note is controlled according to the action of striking the striking surface F by the user.
  • the order of the plurality of notes constituting the music is specified in the composite data S, but the pronunciation start point of each note is not specified in the composite data S.
  • the phoneme specifying unit 212 determines whether or not the phoneme Sy specified by the synthetic data S for each note is a phoneme composed of consonants and vowels (hereinafter referred to as "specific phoneme"). Specifically, the phoneme specifying unit 212 determines that the phoneme Sy composed of a consonant and the vowel following the consonant is a specific phoneme, and the phoneme Sy composed of a single vowel is other than the specific phoneme. Judged as a phoneme.
  • the user takes the rhythm of the music by hitting the hitting surface F in sequence. Specifically, the user hits the hitting surface F at each time when the pronunciation of each note in the music should be started.
  • the pronunciation start point of the vowel following the consonant is audibly recognized as the pronunciation start point of the specific phoneme as a whole. Therefore, in a configuration in which the consonant of a specific phoneme is started to be pronounced when the user hits the striking surface F (hereinafter referred to as “hit time”), and the vowel is pronounced after the consonant, the user recognizes it. It is perceived that the pronunciation of a specific phoneme of the note is started when it is delayed from the start point of the note. Therefore, in the present embodiment, the pronunciation of the specific phoneme is started before the time of hitting. Therefore, it is possible to reduce the delay in hearing a specific phoneme.
  • FIG. 3 is a graph showing the relationship between the distance P between the hand H and the striking surface F and the time.
  • the distance P is the height of the hand H from the striking surface F.
  • the distance P becomes 0.
  • the specific state means that the distance P becomes a specific distance (hereinafter, “specific distance”) Pz in the process of decreasing. That is, the specific state is the state of the hand H before it comes into contact with the striking surface F.
  • the distance P may be, for example, the distance between the reference point (for example, the center point) on the striking surface F and the hand H.
  • FIG. 3 shows t1 at the time when the hand H is in a specific state (hereinafter referred to as “reaching time”) and t2 at the time of hitting.
  • a consonant of a specific phoneme is sounded at the arrival point t1 (that is, a time when the distance P becomes a specific distance Pz), and a vowel of a specific phoneme is sounded at a hit time t2 (that is, a time when the distance P becomes 0).
  • the detection unit 213 of FIG. 2 includes a first detection unit 31 and a second detection unit 32.
  • the first detection unit 31 detects that the hand H is in a specific state.
  • the first detection unit 31 specifies the distance P by using the image data D1.
  • the first detection unit 31 estimates the region of the hand H from the image data D1 by image recognition such as contour extraction, and specifies the distance P of the hand H from the distance measured by the first sensor 13 for the pixels in the region. To do. Any known technique is used to specify the distance P.
  • the first detection unit 31 determines whether or not the distance P has reached the specific distance Pz by comparing the distance P with the first threshold value.
  • the first threshold value is set according to, for example, a specific distance Pz.
  • the second detection unit 32 detects that the hand H has hit the striking surface F due to the movement of the hand H. Specifically, the second detection unit 32 detects that the hand H has hit the striking surface F by analyzing the sound signal D2. First, the second detection unit 32 identifies the volume of the sound represented by the sound signal D2 (hereinafter referred to as “sound collection level”) by analyzing the sound signal D2. Any known sound analysis technique is used for the analysis of the sound signal D2. Next, the second detection unit 32 determines whether or not the hand H has hit the striking surface F by comparing the sound collection level with the second threshold value. For example, when the hand H hits the hitting surface F, a hitting sound is generated. The second threshold value is set assuming, for example, the hitting sound when the hand H hits the hitting surface F.
  • the sound collection level is lower than the second threshold value, it is determined that the sound signal D2 does not include the striking sound. That is, it is determined that the striking surface F is not striking.
  • the sound collection level exceeds the second threshold value, it is determined that the sound signal D2 contains a striking sound. That is, it is determined that the hand H has hit the striking surface F.
  • a slight time difference actually occurs inevitably between the hit time t2 at which the hand H hits the hit surface F and the time when the hit (hit event) is detected. In, the hit time t2 and the time when the hit is detected are equated with each other as substantially the same time point.
  • the pronunciation control unit 214 generates a composite signal V representing the sound specified by the composite data S.
  • the composite signal V is a signal representing a voice in which the composite data S pronounces the phoneme Sy specified for the note at the pitch Sx specified for each note.
  • a known technique is arbitrarily adopted for speech synthesis.
  • a statistical model type that generates a synthetic signal V by using a statistical model such as HMM (Hidden Markov Model) or a neural network, which is a piece-connected voice synthesis that generates a synthetic signal V by connecting a plurality of voice elements. Speech synthesis is used to generate the synthetic signal V.
  • the pronunciation start point of each phoneme S designated by the synthetic data S is controlled according to the result of detection by the first detection unit 31 and the second detection unit 32.
  • the sound control unit 214 causes the phoneme to be pronounced when the hit surface F is hit. .. Specifically, the sound control unit 214 causes the second detection unit 32 to pronounce the phoneme when the hit is detected. That is, a synthetic signal V in which the sounding start point of the entire phoneme is set at the hit time t2 is generated.
  • the phoneme Sy specified by the synthetic data S is specified to be a specific phoneme by the phoneme specifying unit 212
  • the sound control unit 214 causes the specific phoneme to be pronounced before hitting the striking surface F. ..
  • the sound control unit 214 produces a consonant of a specific phonology when the first detection unit 31 detects a specific state, and a vowel of the specific phonology when the second detection unit 32 detects a blow. Make it pronounce. That is, a synthetic signal V is generated in which the pronunciation start point of the consonant of the specific phoneme is set at the arrival time t1 and the pronunciation start point of the vowel following the consonant is set at the striking time t2. The combined signal V is supplied to the sound emitting device 25.
  • the sound emitting device 25 (for example, a speaker) is a reproduction device that emits the sound represented by the synthetic signal V. Therefore, the sound in which the pronunciation start point of the phoneme Sy is controlled is emitted for the music. That is, it is possible to reduce the delay in hearing the entire specific phoneme of the music.
  • FIG. 4 is a flowchart of processing of the control device 21.
  • the user hits the striking surface F when he / she wants to start the pronunciation of each note in the musical piece. That is, the striking surface F is striked by the hand H for each note.
  • the process of FIG. 4 is executed for each note of the composite data S.
  • the musical note to be processed in FIG. 4 is referred to as a “target musical note”.
  • a process of specifying the distance P by the first detection unit 31 and a process of specifying the sound collection level by the second detection unit 32 are executed.
  • the process of specifying the distance P and the process of specifying the sound collection level are repeatedly executed in a cycle shorter than the cycle in which the process of FIG. 4 is executed.
  • the phoneme specifying unit 212 determines whether or not the phoneme Sy of the target note in the composite data S is a specific phoneme (Sa1).
  • the first detection unit 31 determines whether or not the hand H is in a specific state while moving toward the striking surface F. (Sa2). That is, it is determined whether or not the distance P is at the specific distance Pz in the process of decreasing the distance P. Specifically, the first detection unit 31 determines whether or not the distance P is decreasing.
  • the first detection unit 31 determines whether or not the hand H is in a specific state by comparing the distance P with the first threshold value. When the distance P is increasing, it is not determined whether or not the hand H is in a specific state.
  • the sound control unit 214 makes a consonant of a specific phoneme sound (Sa3). Specifically, the sound control unit 214 generates a synthetic signal V in which the sounding start point of the consonant of the specific phoneme is set at the time when the specific state is detected, and supplies the synthetic signal V to the sound emitting device 25. That is, the consonant of the specific phonology is pronounced at the time when the specific state is detected (that is, the arrival time t1).
  • the process of step Sa2 is repeatedly executed until the hand H is in the specific state.
  • the second detection unit 32 determines whether or not the hand H has hit the striking surface F (Sa4). Specifically, by comparing the sound collection level with the second threshold value, it is determined whether or not the hand H has hit the striking surface F.
  • the sound control unit 214 causes the consonant of the specific phoneme to sound the vowel following the consonant (Sa5). Specifically, the sound control unit 214 generates a synthetic signal V in which the sounding start point of the consonant of the specific phoneme is set at the time when the hit on the hitting surface F is detected, and the synthetic signal V is sent to the sound emitting device 25.
  • step Sa4 the process of step Sa4 is repeatedly executed until the hand H moves to the striking surface F and hits the striking surface F.
  • the pronunciation of the specific phoneme is started before the hand H hits the striking surface F.
  • step Sa2 and step Sa3 are omitted, and step Sa4 Processing is executed. That is, for the phonemes other than the specific phonemes, the pronunciation of the phonemes is started at the time of hitting t2.
  • the continuation length of the note may be a fixed time length, or may be a time length specified for each note by the composite data S.
  • a consonant of a specific phonology is pronounced when the hand H is detected to be in a specific state, and the specific phonology is detected when a blow to the striking surface F is detected. Vowel is pronounced. Therefore, the consonant of the specific phoneme can be pronounced before the hand H hits the striking surface F. That is, it is possible to reduce the perception that a specific phoneme is delayed. Further, since the vowel of the specific phonology is pronounced by detecting the impact of the hand H on the striking surface F, the consonant can be pronounced before the vowel while maintaining the operation feeling for pronouncing the specific phonology.
  • the distance P between the hand H and the striking surface F is at the specific distance Pz. That is, the state on the way from the hand H to the striking surface F is detected as a specific state. Therefore, the consonant can be pronounced without the user being aware of the operation for pronouncing the consonant of the specific phoneme. Further, since the impact of the hand H on the striking surface F is detected by analyzing the sound signal D2, it is possible to pronounce a vowel of a specific phoneme when a striking sound is generated by the impact on the striking surface F.
  • the sound produced by detecting the specific state corresponds to the "first sound”
  • the sound produced by detecting the impact on the striking surface F corresponds to the "second sound”.
  • the consonant of the specific phoneme is an example of the "first sound”
  • the vowel of the specific phoneme is an example of the "second sound”. That is, the sound control unit 214 is comprehensively expressed as an element that sounds the first sound when a specific state is detected and sounds the second sound when a hit on the hitting surface F is detected.
  • the first sound is not limited to the consonants of the specific phoneme
  • the second sound is not limited to the vowels of the specific phoneme.
  • the sound related to the preparatory movement for pronunciation hereinafter referred to as “preparatory sound”
  • the sound following the preparatory movement hereinafter referred to as “target sound”
  • a target sound is a sound that is defined by a musical note and is the object of singing or playing.
  • the preparatory sound is a sound produced due to the preparatory operation for pronouncing the target sound.
  • a breath sound is exemplified as a preparatory sound
  • a voice sung after the breath sound is exemplified as a target sound.
  • the performance sound of an instrument for example, the breathing sound generated when playing a wind instrument, the fret sound of a string instrument, or the wind noise of a stick when playing a percussion instrument are exemplified as the preparation sound, and the preparation is made.
  • the performance sound of the instrument following the sound is exemplified as the target sound. That is, the voice synthesized by the pronunciation control device 20 is not limited to the voice singing the music.
  • the target sound is sounded before the original target sound. It is possible to pronounce a preparatory sound for pronunciation.
  • the entire phoneme may be the first sound, and the entire other phoneme following the phoneme may be the second sound.
  • first and second sounds are phonemes (for example, vowels or consonants).
  • the configuration in which the first phoneme, which is an example of the first sound, is a consonant and the second phoneme, which is an example of the second sound, is a vowel is illustrated, but each of the first phoneme and the second phoneme is illustrated. It doesn't matter if it is a vowel or a consonant.
  • a phoneme composed of a consonant and a consonant following the consonant, or a phoneme composed of a vowel and a vowel following the vowel is assumed.
  • the first phoneme in the phoneme is exemplified as the first phoneme
  • the phoneme following the first phoneme is exemplified as the second phoneme.
  • the distance image sensor capable of measuring the distance is illustrated as the first sensor 13, but the function of measuring the distance is not essential in the first sensor 13.
  • an image sensor may be used as the first sensor 13.
  • the first detection unit 31 may calculate the movement amount of the hand H by analyzing the image captured by the image sensor, and may estimate the distance P from the movement amount.
  • the function of capturing the image of the hand H is not essential in the first sensor 13.
  • an infrared sensor that emits infrared light may be used as the first sensor 13. In the configuration in which the infrared sensor is used as the first sensor 13, the first sensor 13 specifies the distance between the hand H and the first sensor 13 from the light receiving intensity received by the infrared light reflected by the hand H.
  • the first detection unit 31 determines that the hand H is in a specific state when the distance between the hand H and the first sensor 13 is less than a predetermined threshold value, and when the distance exceeds the threshold value, the hand H is moved. Judge that it is not in a specific state. That is, it is not essential to calculate the distance P in the process of determining whether or not the hand H is in a specific state.
  • the distance between the hand H and the first sensor 13 corresponds to the sum of the distance P between the hand H and the striking surface F and the distance between the striking surface F and the first sensor 13.
  • the distance between the hand H and the striking surface F When the distance P between the hand H and the striking surface F is at the specific distance Pz, the distance between the hand H and the first sensor 13 becomes a specific distance, so that the distance P is also specified in the above configuration. It can be said that being at a distance Pz is a specific state.
  • the function of the first detection unit 31 may be mounted on the first sensor 13. When the first sensor 13 detects a specific state, the first sensor 13 instructs the sound control unit 214 to pronounce a consonant having a specific phoneme.
  • the impact on the striking surface F is detected by analyzing the sound signal D2, but the method for detecting the impact is not limited to the above examples.
  • the second detection unit 32 determines that the hand H has hit the striking surface F.
  • a vibration sensor that detects vibration when the hand H hits the striking surface F may be used as the second sensor 15.
  • the second sensor 15 generates a signal according to, for example, the magnitude of vibration.
  • the second detection unit 32 detects the impact in response to the signal.
  • a pressure sensor that detects the pressure applied to the striking surface F when the hand H comes into contact with the striking surface F may be used as the second sensor 15.
  • the second sensor 15 generates a signal according to, for example, the magnitude of the pressure applied to the striking surface F.
  • the second detection unit 32 detects the impact in response to the signal.
  • the second sensor 15 may be equipped with the function of the second detection unit 32. When the second sensor 15 detects a hit on the hitting surface F, the second sensor 15 instructs the sound control unit 214 to pronounce a vowel of a specific phoneme.
  • the first sensor 13 and the second sensor 15 are housed in the internal space of the housing 112, but the positions where the first sensor 13 and the second sensor 15 are installed are arbitrary.
  • the first sensor 13 and the second sensor 15 may be installed outside the housing 112. In the configuration in which the first sensor 13 is installed outside the housing 112, it is not essential that the upper surface of the housing 112 is formed of a light-transmitting member in the operation receiving unit 11.
  • the striking surface F is hit by the hand H, but the object that hits the striking surface F is not limited to the hand H.
  • the type of the object is arbitrary as long as it is possible to hit the striking surface F.
  • a striking member such as a stick may be an object. The user moves the stick toward the striking surface F to strike the striking surface F.
  • the object includes both a part of the user's body (typically the hand H) and a striking member operated by the user.
  • the first sensor 13 or the second sensor 15 may be mounted on the member.
  • the specific state is not limited to the above examples.
  • the specific state is arbitrary as long as the hand H is in the middle of moving toward the striking surface F.
  • the change in the moving direction of the hand H may be set as a specific state.
  • the moving direction of the hand H changes from the direction away from the striking surface F to the approaching direction, or the moving direction of the hand H changes from the direction horizontal to the direction perpendicular to the striking surface F.
  • the change in the shape of the hand H (for example, change from goo to par) may be set as a specific state.
  • the continuation length of a consonant of a specific phonology differs depending on the type of the consonant.
  • the time length required to pronounce the consonant "s" in the specific phoneme “sa” is about 250 ms
  • the time length required to pronounce the consonant "k” in the specific phoneme “ka” is about 30 ms. is there. That is, the appropriate specific distance Pz differs depending on the type of consonant of the specific phonology. Therefore, a configuration in which the first threshold value is variably set according to the type of consonant of a specific phoneme is also adopted.
  • the first detection unit 31 sets the first threshold value according to the type of consonant of the phoneme specifying unit 212. Then, the first detection unit 31 determines whether or not the hand H is in a specific state by comparing the set first threshold value with the distance P.
  • the operation reception unit 11 is composed of the housing 112 and the light transmission unit 114, but the operation reception unit 11 is not limited to the above examples.
  • a flat plate-shaped member may be used as the operation reception unit 11.
  • the keyboard-type operator may be used as the operation reception unit 11.
  • the pitch Sx for each note of the composite data S. The user instructs the sounding start point of each note and the pitch of the note by operating the operation reception unit 11. That is, the pitch of each note may be set according to an instruction from the user.
  • the surface of the operation reception unit 11 that the user comes into contact with when hitting corresponds to the hitting surface F.
  • the state of the user's hand H may be detected and the pronunciation may be controlled according to the detection result.
  • note conditions for example, pitch, phoneme or continuation length
  • the state of the user's hand H is, for example, the moving speed of the hand H, the moving direction of the hand H, the shape of the hand H, or the like.
  • the combination of the detected hand H state and the note condition is arbitrary.
  • the user can instruct the condition of the note by changing the state of the hand H.
  • a specific configuration for controlling pronunciation according to the state of the user's hand H will be illustrated.
  • the type of phoneme (that is, pronunciation content) may be set according to the movement speed of hand H.
  • the first detection unit 31 detects the moving speed of the hand H from the image data D1.
  • the moving speed is detected from the time change of the distance P specified from the image data D1.
  • the first detection unit 31 may detect the moving speed of the hand H by using, for example, the output from the speed sensor that detects the speed.
  • the phoneme specifying unit 212 sets the type of the specific phoneme according to the moving speed.
  • the phoneme specifying unit 212 sets the type of the specific phoneme before the hand H is in the specific state.
  • FIG. 5 is a schematic diagram showing the relationship between the moving speed of the hand H and the type of the specific phoneme.
  • FIG. 5 illustrates a specific phoneme set when the moving speed of the hand H1 is fast and a specific phoneme set when the moving speed of the hand H2 is slow.
  • a specific phoneme for example, "ta (ta)" including a consonant (for example, [t]) having a short duration
  • a specific phoneme eg, "sa”
  • a specific phoneme eg, "sa”
  • the consonant is started to be pronounced at the arrival time t1 when the distance P becomes the specific distance Pz, and the vowel is started to be pronounced at the hit time t2.
  • the time length from the arrival point t1 to the striking time t2 is shorter than when the moving speed of the hand H2 is slow.
  • the continuous length or pitch of the note may be set according to the moving speed of the hand H. In the above examples, the case of setting the specific phoneme type is illustrated, but the phoneme type other than the specific phoneme may be controlled according to the moving speed.
  • the type of phoneme may be set according to the moving direction of hand H.
  • the user hits the hitting surface F by moving the hand H from different directions according to the desired phoneme.
  • the user can hit the hitting surface F by moving the hand H from various directions with respect to the hitting surface F.
  • the first detection unit 31 detects the moving direction of the hand H from the image data D1
  • the phoneme specifying unit 212 sets the type of phoneme according to the moving direction.
  • the phoneme specifying unit 212 sets the phoneme type before the hand H is in a specific state.
  • the continuous length or pitch of the note may be set according to the moving direction of the hand H.
  • Shape of hand H For example, the type of phoneme may be set according to the shape of hand H.
  • the user hits the striking surface F with the hand H having an arbitrary shape by, for example, moving a finger. For example, move the hand H so that it has a goo, choki, or par shape.
  • FIG. 6 is a table showing the relationship between the shape of the hand H and the phonology.
  • the type of phoneme may be set in consideration of whether the hand H is the right hand or the left hand.
  • the state of the hand H also includes whether the user's hand H is the right hand or the left hand.
  • the first detection unit 31 detects whether the hand H is the right hand or the left hand and the shape of the hand H from the image data D1.
  • a known image analysis technique is arbitrarily adopted for detecting whether the hand H is the right hand or the left hand and the shape of the hand H.
  • the phoneme specifying unit 212 sets the phoneme type before the hand H is in a specific state.
  • the phoneme specifying unit 212 specifies the phoneme according to the shapes of the right / left hand and the hand H. As illustrated in FIG. 6, for example, when the striking surface F is struck with the left hand in the shape of a goo, the phoneme "ta" is pronounced.
  • the continuous length or pitch of the note may be set according to the shape of the hand H.
  • the moving speed of the hand H is detected, and the pronunciation of the phoneme is controlled according to the content of the detection.
  • the pronunciation of the phoneme is controlled according to the content of the detection.
  • the user can control the pronunciation of the first sound and the second sound by changing the moving speed, moving direction, and shape of the object.
  • the state of the hand H is not limited to the moving speed of the hand H, the moving direction of the hand H, and the shape of the hand H.
  • the moving angle of the hand H (the angle at which the hand H moves with respect to the striking surface F) may be set as the state of the hand H.
  • the first detection unit 31 detects the moving speed of the hand H from, for example, the image data D1. The moving speed is detected before the hand H is in a specific state. Next, the first detection unit 31 sets the first threshold value according to the moving speed of the hand H. Specifically, the first detection unit 31 sets the first threshold value relatively large when the moving speed of the hand H is fast, and sets the first threshold value relatively small when the moving speed of the hand H is slow. To do. Then, the first detection unit 31 compares the set first threshold value with the distance P, and determines whether or not the distance P has reached the specific distance Pz. According to the above configuration, it is possible to reduce the change in the continuous length of the consonant according to the moving speed of the hand H.
  • the first threshold value may be changed according to the moving direction of the hand H.
  • the first detection unit 31 detects the moving direction of the hand H from, for example, the image data D1. The moving direction is detected before the hand H is in a specific state.
  • the first detection unit 31 sets the first threshold value according to the moving direction of the hand H. For example, when the moving direction of the hand H is the first direction, the first detection unit 31 sets the first threshold value to the first value, and the moving direction of the hand H is different from the first direction in the second direction. If, the first threshold value is set to a second value larger than the first value. Then, the first detection unit 31 compares the set first threshold value with the distance P, and determines whether or not the distance P has reached the specific distance Pz.
  • the continuation length of the consonant of the specific phoneme changes according to the first threshold value. Specifically, the continuation length of a consonant of a specific phonology becomes longer as the first threshold value becomes larger, and becomes shorter as the first threshold value becomes smaller.
  • the first threshold value may be set variably.
  • the time point at which the pronunciation of the phoneme is finished may be controlled according to the movement of the hand H by the user.
  • the pronunciation of the phoneme may be terminated when the hand H separates from the striking surface F after striking the striking surface F.
  • FIG. 7 is a block diagram illustrating the configuration of the detection unit 213 according to the modified example.
  • the detection unit 213 includes a third detection unit 33 in addition to the first detection unit 31 and the second detection unit 32.
  • the third detection unit 33 detects that the hand H is separated from the striking surface F.
  • the analysis of the image data D1 detects that the hand H is separated from the striking surface F.
  • the third detection unit 33 may detect that the hand H is separated from the striking surface F by using the output from the pressure sensor that detects the pressure applied to the striking surface F.
  • the pronunciation control unit 214 ends the pronunciation of the phoneme.
  • the striking surface F is hit by the user's hand H, but for example, the user hits the virtual striking surface F by using the antennae technology (haptics) using tactile feedback. Is also adopted.
  • the user strikes the striking surface F prepared in the virtual space by operating an operator capable of operating a pseudo hand in the virtual space displayed on the display device. By mounting a vibration motor that vibrates when hitting the hitting surface F in the virtual space on the operator, the user perceives that the hitting surface F is actually hit.
  • the hand in the virtual space is in a specific state, a consonant of a specific phoneme is pronounced, and when the striking surface F is hit in the virtual space, a vowel of the specific phoneme is sounded.
  • the striking surface F may be a surface in the virtual space.
  • the hand H may be a hand in the virtual space.
  • the function of the sound control device 20 illustrated above is realized by the cooperation of one or more processors constituting the control device 21 and the program stored in the storage device 23.
  • the program according to the present disclosure may be provided and installed on a computer in a form stored in a computer-readable recording medium.
  • the recording medium is, for example, a non-transitory recording medium, and an optical recording medium (optical disc) such as a CD-ROM is a good example, but a known arbitrary such as a semiconductor recording medium or a magnetic recording medium. Recording media in the format of are also included.
  • the non-transient recording medium includes any recording medium other than the transient propagation signal (transitory, propagating signal), and the volatile recording medium is not excluded. Further, in the configuration in which the distribution device distributes the program via the communication network, the storage device 23 that stores the program in the distribution device corresponds to the above-mentioned non-transient recording medium.
  • the sound control method detects that an object is in a specific state while the object is moving toward a surface, and when the specific state is detected. Is made to sound the first sound, the hitting event in which the object hits the surface due to the movement of the object is detected, and the second sound is sounded when the hitting event is detected.
  • the first sound is produced when the object reaches a specific state while moving toward the surface, and the second sound is produced when the object hits the surface. Therefore, the first sound can be pronounced before the object hits the surface. Further, since the second sound is pronounced by detecting the impact of the object on the surface, the first sound can be sounded before the second sound while maintaining the operation feeling for sounding the second sound. ..
  • the first sound is a first phoneme
  • the second sound is a second phoneme different from the first phoneme.
  • the first phoneme is a consonant and the second phoneme is a vowel.
  • a consonant is pronounced when the object is in a specific state, and a vowel is pronounced following the consonant when the object hits the surface. Therefore, it is possible to reduce the perception that the pronunciation of a phoneme composed of consonants and vowels is delayed.
  • the first sound is a sound related to the preparatory movement for pronunciation
  • the second sound is a sound following the preparatory movement.
  • the specific state is that the distance between the object and the surface is at a specific distance.
  • the first sound is produced when the distance between the object and the surface becomes a specific distance. That is, the first sound is pronounced in the middle of the movement of the object toward the surface. Therefore, the first sound can be pronounced without the user being aware of the operation for pronouncing the first sound.
  • the impact in the detection of the impact of the object, is detected by analyzing the sound signal generated by the sound collecting device.
  • the impact of the object on the surface is detected by analyzing the sound signal generated by the sound collecting device. Therefore, the striking sound generated by striking the surface can be used for the pronunciation of the second sound.
  • At least one of the moving speed of the object, the moving direction of the object, and the shape of the object is detected, and depending on the content of the detection, Controls the pronunciation of at least one of the first sound and the second sound.
  • the pronunciation in at least one of the first sound and the second sound is controlled according to the speed at which the object moves, the direction in which the object moves, and at least one of the shapes of the objects. Therefore, the pronunciation of the first sound and the second sound can be controlled by the user changing the moving speed, moving direction, and shape of the object.
  • the object is in a specific state while the object is moving toward the surface, and the movement of the object causes the object to move to the surface. It is provided with a detection unit that detects a striking event that hits, and a sound control unit that sounds the first sound when the specific state is detected and sounds the second sound when the hit event is detected. To do.
  • the pronunciation control method and the pronunciation control device of the present disclosure can start pronunciation before an object such as a user's finger comes into contact with a surface such as a key.

Abstract

A sound output control device (20) comprises: a detection unit (213) for detecting that an object is in a specific state while the object is moving toward a surface and detecting a striking event in which the object strikes the surface as a result of the movement; and a sound output control unit (214) for causing a first sound to be output when the specific state is detected and causing a second sound to be output when the striking event is detected.

Description

発音制御方法および発音制御装置Pronunciation control method and pronunciation control device
 本開示は、発音を制御する技術に関する。 This disclosure relates to a technique for controlling pronunciation.
 例えばキーボード等の操作子に対する操作により歌唱音声を合成する技術が従来から提案されている。例えば、特許文献1には、利用者が所望の音高に対応する鍵を押下すると、当該音高について設定された歌詞が発音される。具体的には、利用者の指が鍵に接触したことを検出すると子音を発音させ、当該鍵が押し切られたことを検出すると当該子音に後続する母音が発音される。 For example, a technique for synthesizing a singing voice by operating an operator such as a keyboard has been proposed. For example, in Patent Document 1, when a user presses a key corresponding to a desired pitch, the lyrics set for the pitch are pronounced. Specifically, when it is detected that the user's finger touches the key, a consonant is sounded, and when it is detected that the key is pressed off, a vowel following the consonant is sounded.
日本国特開2014-98801号公報Japanese Patent Application Laid-Open No. 2014-98801
 特許文献1の技術においては、利用者による接触を契機として発音が開始される。しかし、例えば、発音の内容によっては指が鍵に接触する前に発音を開始させたい場合もある。以上の事情を考慮して、利用者の指等の物体が鍵等の面に接触する前に発音を開始させることを目的とする。 In the technique of Patent Document 1, pronunciation is started when the user contacts. However, for example, depending on the content of the pronunciation, it may be desired to start the pronunciation before the finger touches the key. In consideration of the above circumstances, the purpose is to start pronunciation before an object such as a user's finger comes into contact with a surface such as a key.
 以上の課題を解決するために、本開示のひとつの態様に係る発音制御方法は、物体が面に向けて移動している途中において前記物体が特定の状態にあることを検出し、前記特定の状態が検出されたときに第1音を発音させ、前記物体の移動により前記物体が前記面を打撃した打撃イベントを検出し、前記打撃イベントが検出されたときに第2音を発音させる。 In order to solve the above problems, the sound control method according to one aspect of the present disclosure detects that the object is in a specific state while the object is moving toward the surface, and the specific state is detected. When the state is detected, the first sound is sounded, the striking event in which the object hits the surface due to the movement of the object is detected, and the second sound is sounded when the striking event is detected.
 本開示のひとつの態様に係る発音制御装置は、物体が面に向けて移動している途中において前記物体が特定の状態にあること、および、前記物体の移動により前記物体が前記面を打撃した打撃イベントを検出する検出部と、前記特定の状態が検出されたときに第1音を発音させ、前記打撃イベントが検出されたときに第2音を発音させる発音制御部とを具備する。 In the sound control device according to one aspect of the present disclosure, the object is in a specific state while the object is moving toward the surface, and the object hits the surface due to the movement of the object. It includes a detection unit that detects a striking event, and a sound control unit that sounds a first sound when the specific state is detected and a second sound when the striking event is detected.
本開示の第1実施形態に係る発音制御システムの構成を例示する構成図である。It is a block diagram which illustrates the structure of the pronunciation control system which concerns on 1st Embodiment of this disclosure. 発音制御装置の機能的な構成を例示するブロック図である。It is a block diagram which illustrates the functional structure of a sounding control device. 手と打撃面との間の距離と、時間との関係を示すグラフである。It is a graph which shows the relationship between the distance between a hand and a striking surface, and time. 制御装置が実行する処理のフローチャートである。It is a flowchart of the process executed by a control device. 手の移動速度と特定音韻の種類との関係を表す模式図である。It is a schematic diagram which shows the relationship between the movement speed of a hand and the type of a specific phoneme. 手の形状と音韻との関係とを表す表である。It is a table showing the relationship between the shape of the hand and the phonology. 変形例に係る検出部の構成を例示するブロック図である。It is a block diagram which illustrates the structure of the detection part which concerns on the modification.
<実施形態>
 図1は、本開示の実施形態に係る発音制御システム100の構成を例示するブロック図である。発音制御システム100は、特定の歌唱者が楽曲を歌唱した仮想的な音声を合成する。合成される音声を構成する各音韻は、利用者から指示された時点で発音される。
<Embodiment>
FIG. 1 is a block diagram illustrating the configuration of the pronunciation control system 100 according to the embodiment of the present disclosure. The pronunciation control system 100 synthesizes a virtual voice in which a specific singer sings a musical piece. Each phoneme that composes the synthesized voice is pronounced at the time instructed by the user.
 発音制御システム100は、操作ユニット10と発音制御装置20とを具備する。利用者は、操作ユニット10を自身の手Hで打撃することで各音韻の発音を開始する時点(以下「発音開始点」という)を発音制御装置20に対して指示する。発音制御装置20は、利用者からの指示に応じて各音韻を発音させることで音声を合成する。 The pronunciation control system 100 includes an operation unit 10 and a pronunciation control device 20. The user instructs the pronunciation control device 20 at a time when the operation unit 10 is hit with his / her own hand H to start the pronunciation of each phoneme (hereinafter referred to as “pronunciation start point”). The pronunciation control device 20 synthesizes voice by pronouncing each phoneme according to an instruction from the user.
 操作ユニット10は、操作受付部11と第1センサ13と第2センサ15とを具備する。操作受付部11は、利用者の手Hで打撃される面(以下「打撃面」という)Fを含む。手Hは打撃面Fを打撃する「物体」の例示である。具体的には、操作受付部11は、筐体112と光透過部114とを具備する。筐体112は、例えば上方が開口した中空の構造体である。光透過部114は、第1センサ13が検出可能な波長域の光を透過する部材で形成された平板状の部材である。筐体112の開口を塞ぐように光透過部114が設置される。光透過部114のうち筐体112の内部空間とは反対側の面が打撃面Fに相当する。利用者は、各音韻の発音開始点を指示するために、手Hで打撃面Fを打撃する。具体的には、利用者は、打撃面Fの上方から当該打撃面Fに向けて手Hを移動させることで、当該打撃面Fを打撃する。手Hが打撃面Fを打撃した時点に応じて音韻が発音される。 The operation unit 10 includes an operation reception unit 11, a first sensor 13, and a second sensor 15. The operation reception unit 11 includes a surface (hereinafter referred to as “striking surface”) F that is hit by the user's hand H. The hand H is an example of an "object" that hits the striking surface F. Specifically, the operation receiving unit 11 includes a housing 112 and a light transmitting unit 114. The housing 112 is, for example, a hollow structure having an opening at the top. The light transmitting portion 114 is a flat plate-shaped member formed of a member that transmits light in a wavelength range that can be detected by the first sensor 13. The light transmitting portion 114 is installed so as to close the opening of the housing 112. The surface of the light transmitting portion 114 on the side opposite to the internal space of the housing 112 corresponds to the striking surface F. The user hits the striking surface F with the hand H in order to indicate the pronunciation start point of each phoneme. Specifically, the user hits the hitting surface F by moving the hand H from above the hitting surface F toward the hitting surface F. A phoneme is pronounced according to the time when the hand H hits the striking surface F.
 第1センサ13および第2センサ15は、筐体112の内部に収容される。第1センサ13は、利用者の手Hの状態を検出するためのセンサである。例えば、被写体と撮像面との距離を画素毎に測定する距離画像センサが第1センサ13として利用される。例えば、打撃面Fに向かって移動する手Hが第1センサ13により撮像される。第1センサ13は、例えば筐体112の底面の中心部分に設置され、打撃面Fに向かって移動する手Hを掌側(筐体112の内部側)から撮像する。具体的には、第1センサ13は、特定の波長域の光を検知可能であり、打撃面Fの上方に位置する手Hから光透過部114を介して到来する光を受光することで手Hの画像を表すデータ(以下「画像データ」という)D1を生成する。なお、光透過部114は、第1センサ13が検知可能な光を透過する部材で形成される。画像データD1は、発音制御装置20に送信される。第1センサ13と発音制御装置20とは、無線または有線により通信可能である。なお、画像データD1は所定の期間毎に反復的に生成される。 The first sensor 13 and the second sensor 15 are housed inside the housing 112. The first sensor 13 is a sensor for detecting the state of the user's hand H. For example, a distance image sensor that measures the distance between the subject and the imaging surface for each pixel is used as the first sensor 13. For example, the hand H moving toward the striking surface F is imaged by the first sensor 13. The first sensor 13 is installed, for example, in the central portion of the bottom surface of the housing 112, and images the hand H moving toward the striking surface F from the palm side (inside of the housing 112). Specifically, the first sensor 13 can detect light in a specific wavelength range, and receives light coming from the hand H located above the striking surface F via the light transmitting portion 114 to receive the light. Data representing the image of H (hereinafter referred to as "image data") D1 is generated. The light transmitting portion 114 is formed of a member that transmits light that can be detected by the first sensor 13. The image data D1 is transmitted to the sound control device 20. The first sensor 13 and the sound control device 20 can communicate with each other wirelessly or by wire. The image data D1 is repeatedly generated at predetermined intervals.
 第2センサ15は、打撃面Fに対する手Hの打撃を検出するためのセンサである。例えば周囲の音を収音し、当該収音した音を表す音信号D2を生成する収音装置が第2センサ15として利用される。具体的には、第2センサ15は、利用者の手Hが打撃面Fを打撃したときに発生する打撃音を収音する。音信号D2は、発音制御装置20に送信される。第2センサ15と発音制御装置20とは、無線または有線により通信可能である。 The second sensor 15 is a sensor for detecting the impact of the hand H on the impact surface F. For example, a sound collecting device that collects ambient sounds and generates a sound signal D2 representing the collected sounds is used as the second sensor 15. Specifically, the second sensor 15 collects the hitting sound generated when the user's hand H hits the hitting surface F. The sound signal D2 is transmitted to the sound control device 20. The second sensor 15 and the sound control device 20 can communicate with each other wirelessly or by wire.
 図2は、発音制御装置20の構成を例示するブロック図である。発音制御装置20は、利用者による打撃面Fを打撃する動作に応じて音声を合成する。具体的には、発音制御装置20は、制御装置21と記憶装置23と放音装置25とを具備する。 FIG. 2 is a block diagram illustrating the configuration of the sound control device 20. The sound generation control device 20 synthesizes voice according to the action of hitting the hitting surface F by the user. Specifically, the sound control device 20 includes a control device 21, a storage device 23, and a sound emitting device 25.
 制御装置21は、例えば発音制御装置20の各要素を制御する単数または複数のプロセッサである。例えば、制御装置21は、CPU(Central Processing Unit)、SPU(Sound Processing Unit)、GPU(Graphics Processing Unit)、DSP(Digital Signal Processor)、FPGA(Field Programmable Gate Array)、またはASIC(Application Specific Integrated Circuit)等の1種類以上のプロセッサにより構成される。具体的には、制御装置21は、記憶装置23に記憶されたプログラムを実行することで、歌唱者が楽曲を歌唱した音声を表す信号(以下「合成信号」という)Vを生成するための複数の機能(音韻特定部212、検出部213および発音制御部214)を実現する。 The control device 21 is, for example, a single or a plurality of processors that control each element of the sound control device 20. For example, the control device 21 is a CPU (Central Processing Unit), SPU (Sound Processing Unit), GPU (Graphics Processing Unit), DSP (Digital Signal Processor), FPGA (Field Programmable Gate Array), or ASIC (Application Specific Integrated Circuit). ) Etc., it is composed of one or more types of processors. Specifically, the control device 21 executes a program stored in the storage device 23 to generate a plurality of signals (hereinafter referred to as “synthetic signals”) V representing the voices of the singer singing the music. (Phonology identification unit 212, detection unit 213, and sound control unit 214) are realized.
 記憶装置23は、例えば磁気記録媒体または半導体記録媒体等の公知の記録媒体で構成された単数または複数のメモリである。記憶装置23は、制御装置21が実行するプログラムと制御装置21が使用する各種のデータとを記憶する。なお、記憶装置23は、複数種の記録媒体の組合せにより構成されてもよい。また、記憶装置23は、発音制御装置20に対して着脱可能な可搬型の記録媒体、または、発音制御装置20が通信網を介して通信可能な外部記録媒体(例えばオンラインストレージ)としてもよい。具体的には、記憶装置23は、発音制御装置20が合成すべき音を表すデータ(以下「合成データ」という)Sを記憶する。 The storage device 23 is a single or a plurality of memories composed of a known recording medium such as a magnetic recording medium or a semiconductor recording medium. The storage device 23 stores a program executed by the control device 21 and various data used by the control device 21. The storage device 23 may be configured by combining a plurality of types of recording media. Further, the storage device 23 may be a portable recording medium that can be attached to and detached from the sound control device 20, or an external recording medium (for example, online storage) that the sound control device 20 can communicate with via a communication network. Specifically, the storage device 23 stores data (hereinafter referred to as “synthetic data”) S representing sounds to be synthesized by the sound control device 20.
 合成データSは、楽曲の内容を指定するデータである。具体的には、合成データSは、楽曲を構成する複数の音符の各々について、音高Sxと音韻Syとを指定するデータである。音高Sxは、複数の音高のうちの何れか(例えばノートナンバ)である。音韻Syは、音符の発音とともに発声すべき発音内容である。具体的には、音韻Syは、楽曲の歌詞を構成する1個の音節(発音単位)に相当する。例えば、日本語における典型的な音韻Syは、子音とその直後の母音との組合せ、または、母音単体である。合成データSを利用した音声合成により合成信号Vが生成される。利用者による打撃面Fを打撃する動作に応じて各音符の発音開始点が制御される。楽曲を構成する複数の音符の順番は、合成データSで指定されるが、各音符の発音開始点は合成データSでは指定されない。 Synthetic data S is data that specifies the content of the music. Specifically, the synthetic data S is data for designating the pitch Sx and the phoneme Sy for each of the plurality of notes constituting the music. The pitch Sx is any one of a plurality of pitches (for example, a note number). Phonological S is a pronunciation content that should be uttered together with the pronunciation of notes. Specifically, the phoneme Sy corresponds to one syllable (pronunciation unit) constituting the lyrics of the music. For example, a typical phoneme Sy in Japanese is a combination of a consonant and a vowel immediately after it, or a single vowel. The synthetic signal V is generated by voice synthesis using the synthetic data S. The sounding start point of each note is controlled according to the action of striking the striking surface F by the user. The order of the plurality of notes constituting the music is specified in the composite data S, but the pronunciation start point of each note is not specified in the composite data S.
 音韻特定部212は、合成データSが各音符について指定する音韻Syが、子音と母音とから構成される音韻(以下「特定音韻」という)であるか否かを判定する。具体的には、音韻特定部212は、子音と当該子音に後続する母音とで構成される音韻Syについては特定音韻であると判定し、母音単体で構成される音韻Syについては特定音韻以外の音韻であると判定する。 The phoneme specifying unit 212 determines whether or not the phoneme Sy specified by the synthetic data S for each note is a phoneme composed of consonants and vowels (hereinafter referred to as "specific phoneme"). Specifically, the phoneme specifying unit 212 determines that the phoneme Sy composed of a consonant and the vowel following the consonant is a specific phoneme, and the phoneme Sy composed of a single vowel is other than the specific phoneme. Judged as a phoneme.
 利用者は、打撃面Fを順次に打撃することで楽曲のリズムをとる。具体的には、楽曲内の各音符の発音が開始されるべき各時点で利用者は打撃面Fを打撃する。一方、子音に後続する母音の発音開始点が、特定音韻全体としての発音開始点として聴覚的には認識される。したがって、利用者が打撃面Fを打撃した時点(以下「打撃時点」という)において特定音韻の子音の発音が開始され、当該子音に後続して母音が発音される構成では、利用者が認識する音符の開始点から遅延した時点で当該音符の特定音韻の発音が開始されたように知覚される。そこで、本実施形態では、特定音韻については打撃時点よりも前に発音を開始する。したがって、特定音韻が遅延して聴こえることを低減できる。 The user takes the rhythm of the music by hitting the hitting surface F in sequence. Specifically, the user hits the hitting surface F at each time when the pronunciation of each note in the music should be started. On the other hand, the pronunciation start point of the vowel following the consonant is audibly recognized as the pronunciation start point of the specific phoneme as a whole. Therefore, in a configuration in which the consonant of a specific phoneme is started to be pronounced when the user hits the striking surface F (hereinafter referred to as “hit time”), and the vowel is pronounced after the consonant, the user recognizes it. It is perceived that the pronunciation of a specific phoneme of the note is started when it is delayed from the start point of the note. Therefore, in the present embodiment, the pronunciation of the specific phoneme is started before the time of hitting. Therefore, it is possible to reduce the delay in hearing a specific phoneme.
 図3は、手Hと打撃面Fとの間の距離Pと、時間との関係を示すグラフである。図3に例示される通り、手Hを打撃面Fに向けて移動すると、手Hと打撃面Fとの間の距離Pは経時的に小さくなる。距離Pは、打撃面Fからの手Hの高さであるとも換言できる。そして、手Hが打撃面Fを打撃すると距離Pは0になる。利用者の手Hが打撃面Fに向けて移動している途中において特定の状態(以下「特定状態」という)になることが想定される。本実施形態において、特定状態とは、距離Pが減少していく過程において特定の距離(以下「特定距離」)Pzになることである。すなわち、特定状態とは、打撃面Fに接触する前における手Hの状態である。なお、距離Pは、例えば打撃面Fにおける基準点(例えば中心点)と手Hとの間の距離でもよい。 FIG. 3 is a graph showing the relationship between the distance P between the hand H and the striking surface F and the time. As illustrated in FIG. 3, when the hand H is moved toward the striking surface F, the distance P between the hand H and the striking surface F becomes smaller with time. In other words, the distance P is the height of the hand H from the striking surface F. Then, when the hand H hits the striking surface F, the distance P becomes 0. It is assumed that the user's hand H will be in a specific state (hereinafter referred to as "specific state") while moving toward the striking surface F. In the present embodiment, the specific state means that the distance P becomes a specific distance (hereinafter, “specific distance”) Pz in the process of decreasing. That is, the specific state is the state of the hand H before it comes into contact with the striking surface F. The distance P may be, for example, the distance between the reference point (for example, the center point) on the striking surface F and the hand H.
 図3には、手Hが特定状態になった時点(以下「到達時点」という)t1と、打撃時点t2とが図示されている。到達時点t1(すなわち距離Pが特定距離Pzになる時点)に特定音韻の子音が発音され、打撃時点t2(すなわち距離Pが0になる時点)に特定音韻の母音が発音される。すなわち、手Hが特定距離Pzに到達する位置まで移動すると子音の発音が開始され、手Hが特定距離Pzにある位置からさらに移動して打撃面Fを打撃すると当該子音に後続する母音の発音が開始される。 FIG. 3 shows t1 at the time when the hand H is in a specific state (hereinafter referred to as “reaching time”) and t2 at the time of hitting. A consonant of a specific phoneme is sounded at the arrival point t1 (that is, a time when the distance P becomes a specific distance Pz), and a vowel of a specific phoneme is sounded at a hit time t2 (that is, a time when the distance P becomes 0). That is, when the hand H moves to the position where it reaches the specific distance Pz, the sound of the consonant is started, and when the hand H further moves from the position at the specific distance Pz and hits the striking surface F, the sound of the vowel following the consonant is sounded. Is started.
 図2の検出部213は、第1検出部31と第2検出部32とを具備する。第1検出部31は、手Hが特定状態にあることを検出する。まず、第1検出部31は、画像データD1を利用して距離Pを特定する。例えば、第1検出部31は、輪郭抽出等の画像認識により画像データD1から手Hの領域を推定し、当該領域内の画素について第1センサ13が測定した距離から手Hの距離Pを特定する。なお、距離Pの特定には公知の任意の技術が採用される。次に、第1検出部31は、距離Pと第1閾値とを比較することで、当該距離Pが特定距離Pzに到達したか否かを判定する。第1閾値は、例えば特定距離Pzに応じて設定される。距離Pが第1閾値を上回る場合には、距離Pが特定距離Pzに到達していないと判断される。他方、距離Pが第1閾値を下回る場合に距離Pが特定距離Pzに到達したと判断される。なお、手Hが特定状態になる到達時点t1と、当該特定状態が検出される時点との間には、実際には僅かな時間差が不可避的に発生するが、以下の説明においては、到達時点t1と特定状態が検出される時点とを実質的に同一の時点として同視する。 The detection unit 213 of FIG. 2 includes a first detection unit 31 and a second detection unit 32. The first detection unit 31 detects that the hand H is in a specific state. First, the first detection unit 31 specifies the distance P by using the image data D1. For example, the first detection unit 31 estimates the region of the hand H from the image data D1 by image recognition such as contour extraction, and specifies the distance P of the hand H from the distance measured by the first sensor 13 for the pixels in the region. To do. Any known technique is used to specify the distance P. Next, the first detection unit 31 determines whether or not the distance P has reached the specific distance Pz by comparing the distance P with the first threshold value. The first threshold value is set according to, for example, a specific distance Pz. When the distance P exceeds the first threshold value, it is determined that the distance P has not reached the specific distance Pz. On the other hand, when the distance P is lower than the first threshold value, it is determined that the distance P has reached the specific distance Pz. It should be noted that, in reality, a slight time difference is inevitably generated between the arrival time t1 at which the hand H reaches the specific state and the time when the specific state is detected. The time point at which t1 and the specific state are detected are equated with each other as substantially the same time point.
 第2検出部32は、手Hの移動により当該手Hが打撃面Fを打撃したことを検出する。具体的には、第2検出部32は、音信号D2を解析することで手Hが打撃面Fを打撃したことを検出する。まず、第2検出部32は、音信号D2を解析することで、当該音信号D2が表す音の音量(以下「収音レベル」という)を特定する。なお、音信号D2の解析には公知の任意の音解析技術が採用される。次に、第2検出部32は、収音レベルと第2閾値とを比較することで、手Hが打撃面Fを打撃したか否かを判定する。例えば手Hが打撃面Fを打撃すると打撃音が発生する。第2閾値は、例えば手Hが打撃面Fを打撃したときの打撃音を想定して設定される。収音レベルが第2閾値を下回る場合には、音信号D2に打撃音が含まれていないと判定される。すなわち、打撃面Fを打撃していないと判定される。他方、収音レベルが第2閾値を上回る場合には、音信号D2に打撃音が含まれていると判定される。すなわち、手Hが打撃面Fを打撃したと判定される。なお、手Hが打撃面Fを打撃する打撃時点t2と、当該打撃(打撃イベント)が検出される時点との間には、実際には僅かな時間差が不可避的に発生するが、以下の説明においては、打撃時点t2と打撃が検出される時点とを実質的に同一の時点として同視する。 The second detection unit 32 detects that the hand H has hit the striking surface F due to the movement of the hand H. Specifically, the second detection unit 32 detects that the hand H has hit the striking surface F by analyzing the sound signal D2. First, the second detection unit 32 identifies the volume of the sound represented by the sound signal D2 (hereinafter referred to as “sound collection level”) by analyzing the sound signal D2. Any known sound analysis technique is used for the analysis of the sound signal D2. Next, the second detection unit 32 determines whether or not the hand H has hit the striking surface F by comparing the sound collection level with the second threshold value. For example, when the hand H hits the hitting surface F, a hitting sound is generated. The second threshold value is set assuming, for example, the hitting sound when the hand H hits the hitting surface F. When the sound collection level is lower than the second threshold value, it is determined that the sound signal D2 does not include the striking sound. That is, it is determined that the striking surface F is not striking. On the other hand, when the sound collection level exceeds the second threshold value, it is determined that the sound signal D2 contains a striking sound. That is, it is determined that the hand H has hit the striking surface F. It should be noted that a slight time difference actually occurs inevitably between the hit time t2 at which the hand H hits the hit surface F and the time when the hit (hit event) is detected. In, the hit time t2 and the time when the hit is detected are equated with each other as substantially the same time point.
 発音制御部214は、合成データSにより指定される音を表す合成信号Vを生成する。合成信号Vは、合成データSが各音符について指定する音高Sxで当該音符について指定する音韻Syを発音した音声を表す信号である。音声合成には公知の技術が任意に採用される。例えば、複数の音声素片の接続により合成信号Vを生成する素片接続型の音声合成、HMM(Hidden MarkovModel)またはニューラルネットワーク等の統計モデルを利用して合成信号Vを生成する統計モデル型の音声合成が、合成信号Vの生成に利用される。合成データSにより指定される各音韻Syの発音開始点は、第1検出部31および第2検出部32による検出の結果に応じて制御される。 The pronunciation control unit 214 generates a composite signal V representing the sound specified by the composite data S. The composite signal V is a signal representing a voice in which the composite data S pronounces the phoneme Sy specified for the note at the pitch Sx specified for each note. A known technique is arbitrarily adopted for speech synthesis. For example, a statistical model type that generates a synthetic signal V by using a statistical model such as HMM (Hidden Markov Model) or a neural network, which is a piece-connected voice synthesis that generates a synthetic signal V by connecting a plurality of voice elements. Speech synthesis is used to generate the synthetic signal V. The pronunciation start point of each phoneme S designated by the synthetic data S is controlled according to the result of detection by the first detection unit 31 and the second detection unit 32.
 合成データSにより指定される音韻Syが、音韻特定部212により特定音韻以外の音韻であると特定された場合には、発音制御部214は、打撃面Fに対する打撃を契機として当該音韻を発音させる。具体的には、発音制御部214は、第2検出部32が打撃を検出した時点に当該音韻を発音させる。すなわち、音韻全体の発音開始点が打撃時点t2に設定された合成信号Vが生成される。他方、合成データSにより指定される音韻Syが、音韻特定部212により特定音韻であると特定された場合には、発音制御部214は、打撃面Fを打撃する前に当該特定音韻を発音させる。具体的には、発音制御部214は、第1検出部31が特定状態を検出した時点に特定音韻の子音を発音させ、第2検出部32が打撃を検出した時点に当該特定音韻の母音を発音させる。すなわち、特定音韻の子音の発音開始点が到達時点t1に設定され、当該子音に後続する母音の発音開始点が打撃時点t2に設定された合成信号Vが生成される。合成信号Vは放音装置25に供給される。 When the phoneme Sy specified by the synthetic data S is specified by the phoneme specifying unit 212 to be a phoneme other than the specified phoneme, the sound control unit 214 causes the phoneme to be pronounced when the hit surface F is hit. .. Specifically, the sound control unit 214 causes the second detection unit 32 to pronounce the phoneme when the hit is detected. That is, a synthetic signal V in which the sounding start point of the entire phoneme is set at the hit time t2 is generated. On the other hand, when the phoneme Sy specified by the synthetic data S is specified to be a specific phoneme by the phoneme specifying unit 212, the sound control unit 214 causes the specific phoneme to be pronounced before hitting the striking surface F. .. Specifically, the sound control unit 214 produces a consonant of a specific phonology when the first detection unit 31 detects a specific state, and a vowel of the specific phonology when the second detection unit 32 detects a blow. Make it pronounce. That is, a synthetic signal V is generated in which the pronunciation start point of the consonant of the specific phoneme is set at the arrival time t1 and the pronunciation start point of the vowel following the consonant is set at the striking time t2. The combined signal V is supplied to the sound emitting device 25.
 放音装置25(例えばスピーカ)は、合成信号Vが表す音を放音する再生機器である。したがって、楽曲について音韻Syの発音開始点が制御された音声が放音される。すなわち、楽曲の特定音韻全体が遅延して聴こえることを低減できる。 The sound emitting device 25 (for example, a speaker) is a reproduction device that emits the sound represented by the synthetic signal V. Therefore, the sound in which the pronunciation start point of the phoneme Sy is controlled is emitted for the music. That is, it is possible to reduce the delay in hearing the entire specific phoneme of the music.
 図4は、制御装置21の処理のフローチャートである。利用者は、楽曲における各音符の発音を開始したいときに打撃面Fを打撃する。すなわち、音符毎に打撃面Fが手Hで打撃される。図4の処理は、合成データSの音符毎に実行される。以下の説明では、楽曲の複数の音符のうち図4の処理の対象となる音符を「対象音符」と表記する。なお、図4の処理に並行して、第1検出部31による距離Pを特定する処理と、第2検出部32による収音レベルを特定する処理とが実行される。なお、距離Pを特定する処理と、収音レベルを特定する処理とは、図4の処理が実行される周期よりも短い周期で繰り返し実行される。 FIG. 4 is a flowchart of processing of the control device 21. The user hits the striking surface F when he / she wants to start the pronunciation of each note in the musical piece. That is, the striking surface F is striked by the hand H for each note. The process of FIG. 4 is executed for each note of the composite data S. In the following description, among a plurality of musical notes, the musical note to be processed in FIG. 4 is referred to as a “target musical note”. In parallel with the process of FIG. 4, a process of specifying the distance P by the first detection unit 31 and a process of specifying the sound collection level by the second detection unit 32 are executed. The process of specifying the distance P and the process of specifying the sound collection level are repeatedly executed in a cycle shorter than the cycle in which the process of FIG. 4 is executed.
 図4の処理が開始すると、音韻特定部212は、合成データSにおける対象音符の音韻Syが特定音韻であるか否かを判定する(Sa1)。対象音符の音韻Syが特定音韻であると判定された場合(Sa1:YES)、第1検出部31は、手Hが打撃面Fに向けて移動する途中において特定状態にあるか否かを判定する(Sa2)。すなわち、距離Pが減少していく過程において当該距離Pが特定距離Pzにあるか否かが判定される。具体的には、第1検出部31は、距離Pが減少中であるか否かを判定する。距離Pが減少中である場合、第1検出部31は、距離Pと第1閾値とを比較することで、手Hが特定状態にあるか否かを判定する。なお、距離Pが増加中である場合、手Hが特定状態にあるか否かは判定されない。 When the process of FIG. 4 starts, the phoneme specifying unit 212 determines whether or not the phoneme Sy of the target note in the composite data S is a specific phoneme (Sa1). When it is determined that the phoneme Sy of the target note is a specific phoneme (Sa1: YES), the first detection unit 31 determines whether or not the hand H is in a specific state while moving toward the striking surface F. (Sa2). That is, it is determined whether or not the distance P is at the specific distance Pz in the process of decreasing the distance P. Specifically, the first detection unit 31 determines whether or not the distance P is decreasing. When the distance P is decreasing, the first detection unit 31 determines whether or not the hand H is in a specific state by comparing the distance P with the first threshold value. When the distance P is increasing, it is not determined whether or not the hand H is in a specific state.
 手Hが特定状態にあると判定された場合(Sa2:YES)、発音制御部214は、特定音韻の子音を発音させる(Sa3)。具体的には、発音制御部214は、特定音韻の子音の発音開始点を特定状態が検出された時点に設定した合成信号Vを生成し、当該合成信号Vを放音装置25に供給する。すなわち、特定状態が検出された時点(すなわち到達時点t1)に特定音韻の子音が発音される。他方、手Hが特定状態にないと判定された場合(Sa2:NO)、手Hが特定状態になるまでステップSa2の処理が繰り返し実行される。 When it is determined that the hand H is in a specific state (Sa2: YES), the sound control unit 214 makes a consonant of a specific phoneme sound (Sa3). Specifically, the sound control unit 214 generates a synthetic signal V in which the sounding start point of the consonant of the specific phoneme is set at the time when the specific state is detected, and supplies the synthetic signal V to the sound emitting device 25. That is, the consonant of the specific phonology is pronounced at the time when the specific state is detected (that is, the arrival time t1). On the other hand, when it is determined that the hand H is not in the specific state (Sa2: NO), the process of step Sa2 is repeatedly executed until the hand H is in the specific state.
 手Hは特定状態にある位置から打撃面Fに向けてさらに移動する。第2検出部32は、手Hが打撃面Fを打撃したか否かを判定する(Sa4)。具体的には、収音レベルと第2閾値とを比較することで、手Hが打撃面Fを打撃したか否かが判定される。手Hが打撃面Fを打撃したと判定された場合(Sa4:YES)、発音制御部214は、特定音韻の子音に後続する母音を発音させる(Sa5)。具体的には、発音制御部214は、特定音韻の子音の発音開始点を打撃面Fに対する打撃を検出された時点に設定した合成信号Vを生成し、当該合成信号Vを放音装置25に供給する。すなわち、打撃面Fに対する打撃が検出された時点(すなわち打撃時点t2)に特定音韻の母音が発音される。他方、手Hが打撃面Fに到達していないと判定された場合(Sa4:NO)、手Hが打撃面Fまで移動して当該打撃面Fを打撃するまでステップSa4の処理が繰り返し実行される。特定音韻については、以上の処理により、手Hが打撃面Fを打撃するよりも前に当該特定音韻の発音が開始される。 Hand H moves further toward the striking surface F from the position in the specific state. The second detection unit 32 determines whether or not the hand H has hit the striking surface F (Sa4). Specifically, by comparing the sound collection level with the second threshold value, it is determined whether or not the hand H has hit the striking surface F. When it is determined that the hand H has hit the striking surface F (Sa4: YES), the sound control unit 214 causes the consonant of the specific phoneme to sound the vowel following the consonant (Sa5). Specifically, the sound control unit 214 generates a synthetic signal V in which the sounding start point of the consonant of the specific phoneme is set at the time when the hit on the hitting surface F is detected, and the synthetic signal V is sent to the sound emitting device 25. Supply. That is, the vowel of the specific phonology is pronounced at the time when the impact on the impact surface F is detected (that is, the impact time t2). On the other hand, when it is determined that the hand H has not reached the striking surface F (Sa4: NO), the process of step Sa4 is repeatedly executed until the hand H moves to the striking surface F and hits the striking surface F. To. With respect to the specific phoneme, by the above processing, the pronunciation of the specific phoneme is started before the hand H hits the striking surface F.
 他方、対象音符の音韻Syが特定音韻以外の音韻(典型的には母音単体の音韻)であると判定された場合(Sa1:NO)、ステップSa2およびステップSa3の処理は省略して、ステップSa4の処理が実行される。すなわち、特定音韻以外の音韻については、打撃時点t2に当該音韻の発音が開始される。なお、音符の継続長は、一定の時間長であってもよいし、合成データSにより音符毎に指定された時間長であってもよい。 On the other hand, when it is determined that the phoneme Sy of the target note is a phoneme other than the specific phoneme (typically, the phoneme of a single vowel) (Sa1: NO), the processing of step Sa2 and step Sa3 is omitted, and step Sa4 Processing is executed. That is, for the phonemes other than the specific phonemes, the pronunciation of the phonemes is started at the time of hitting t2. The continuation length of the note may be a fixed time length, or may be a time length specified for each note by the composite data S.
 以上の説明から理解される通り、本実施形態では、手Hが特定状態であることが検出されたときに特定音韻の子音が発音され、打撃面Fに対する打撃が検出されたときに当該特定音韻の母音が発音される。したがって、手Hが打撃面Fを打撃する前に特定音韻の子音を発音できる。すなわち、特定音韻が遅延していると知覚されることを低減できる。また、打撃面Fに対する手Hの打撃を検出することで、特定音韻の母音が発音されるから、特定音韻を発音するための操作感を維持したまま、母音よりも前に子音を発音できる。 As can be understood from the above description, in the present embodiment, a consonant of a specific phonology is pronounced when the hand H is detected to be in a specific state, and the specific phonology is detected when a blow to the striking surface F is detected. Vowel is pronounced. Therefore, the consonant of the specific phoneme can be pronounced before the hand H hits the striking surface F. That is, it is possible to reduce the perception that a specific phoneme is delayed. Further, since the vowel of the specific phonology is pronounced by detecting the impact of the hand H on the striking surface F, the consonant can be pronounced before the vowel while maintaining the operation feeling for pronouncing the specific phonology.
 手Hと打撃面Fとの間の距離Pが特定距離Pzにあることが特定状態として検出される。すなわち、手Hが打撃面Fにいたるまでの途中の状態が特定状態として検出される。したがって、特定音韻の子音を発音させるための操作を利用者が意識することなく当該子音を発音することができる。また、音信号D2を解析することで打撃面Fに対する手Hの打撃が検出されるから、打撃面Fに対する打撃により打撃音が発生した場合に特定音韻の母音を発音させることができる。 It is detected as a specific state that the distance P between the hand H and the striking surface F is at the specific distance Pz. That is, the state on the way from the hand H to the striking surface F is detected as a specific state. Therefore, the consonant can be pronounced without the user being aware of the operation for pronouncing the consonant of the specific phoneme. Further, since the impact of the hand H on the striking surface F is detected by analyzing the sound signal D2, it is possible to pronounce a vowel of a specific phoneme when a striking sound is generated by the impact on the striking surface F.
<変形例>
 以上に例示した態様に付加される具体的な変形の態様を以下に例示する。以下の例示から任意に選択された2個以上の態様を、相互に矛盾しない範囲で適宜に併合してもよい。
<Modification example>
Specific modifications added to the above-exemplified embodiments will be illustrated below. Two or more embodiments arbitrarily selected from the following examples may be appropriately merged to the extent that they do not contradict each other.
(1)特定状態の検出により発音される音は「第1音」に相当し、打撃面Fに対する打撃の検出により発音される音は「第2音」に相当する。前述の形態では、特定音韻の子音が「第1音」の例示であり、特定音韻の母音が「第2音」の例示である。すなわち、発音制御部214は、特定状態が検出されたときに第1音を発音させ、打撃面Fに対する打撃が検出されたときに第2音を発音させる要素として包括的に表現される。 (1) The sound produced by detecting the specific state corresponds to the "first sound", and the sound produced by detecting the impact on the striking surface F corresponds to the "second sound". In the above-described form, the consonant of the specific phoneme is an example of the "first sound", and the vowel of the specific phoneme is an example of the "second sound". That is, the sound control unit 214 is comprehensively expressed as an element that sounds the first sound when a specific state is detected and sounds the second sound when a hit on the hitting surface F is detected.
 なお、第1音は特定音韻の子音には限定されず、第2音は特定音韻の母音には限定されない。例えば、発音の準備動作に関する音(以下「準備音」という)を第1音として、当該準備動作に後続する音(以下「目的音」という)を第2音としてもよい。目的音は、音符により規定され、歌唱または演奏の目的となる音である。他方、準備音は、当該目的音を発音するための準備動作に起因して発音される音である。歌唱音を合成する場合には、例えばブレス音が準備音として例示され、当該ブレス音の後に歌唱される音声が目的音として例示される。また、楽器の演奏音を合成する場合には、例えば管楽器の演奏時に発生する気息音、弦楽器のフレット音、または、打楽器を演奏する際のスティックの風切り音等が準備音として例示され、当該準備音に後続する楽器の演奏音が目的音として例示される。すなわち、発音制御装置20で合成される音声は、楽曲を歌唱した音声に限定されない。特定状態が検出された時点に準備音が発音され、打撃面Fに対する打撃が検出された時点に目的音が発音される構成によれば、本来の目的となる目的音の前に当該目的音を発音させるための準備音を発音させることができる。なお、音韻全体を第1音とし、当該音韻に後続する他の音韻全体を第2音としてもよい。 The first sound is not limited to the consonants of the specific phoneme, and the second sound is not limited to the vowels of the specific phoneme. For example, the sound related to the preparatory movement for pronunciation (hereinafter referred to as “preparatory sound”) may be the first sound, and the sound following the preparatory movement (hereinafter referred to as “target sound”) may be the second sound. A target sound is a sound that is defined by a musical note and is the object of singing or playing. On the other hand, the preparatory sound is a sound produced due to the preparatory operation for pronouncing the target sound. When synthesizing a singing sound, for example, a breath sound is exemplified as a preparatory sound, and a voice sung after the breath sound is exemplified as a target sound. In addition, when synthesizing the performance sound of an instrument, for example, the breathing sound generated when playing a wind instrument, the fret sound of a string instrument, or the wind noise of a stick when playing a percussion instrument are exemplified as the preparation sound, and the preparation is made. The performance sound of the instrument following the sound is exemplified as the target sound. That is, the voice synthesized by the pronunciation control device 20 is not limited to the voice singing the music. According to the configuration in which the preparation sound is sounded when a specific state is detected and the target sound is sounded when a hit on the striking surface F is detected, the target sound is sounded before the original target sound. It is possible to pronounce a preparatory sound for pronunciation. The entire phoneme may be the first sound, and the entire other phoneme following the phoneme may be the second sound.
 また、言語音の発音に着目すると、第1音および第2音の各々の典型例は音素(例えば母音または子音)である。実施形態においては、第1音の一例である第1音素が子音であり、第2音の一例である第2音素が母音である構成を例示したが、第1音素および第2音素の各々が母音であるか子音であるかは不問である。例えば、音声合成における楽曲の歌詞の言語によっては、子音および当該子音に後続する子音とで構成される音韻、または、母音と当該母音に後続する母音とで構成される音韻等も想定される。音韻における先頭の音素が第1音素として例示され、当該先頭の音素に後続する音素が第2音素として例示される。 Focusing on the pronunciation of speech sounds, typical examples of the first and second sounds are phonemes (for example, vowels or consonants). In the embodiment, the configuration in which the first phoneme, which is an example of the first sound, is a consonant and the second phoneme, which is an example of the second sound, is a vowel is illustrated, but each of the first phoneme and the second phoneme is illustrated. It doesn't matter if it is a vowel or a consonant. For example, depending on the language of the lyrics of the music in speech synthesis, a phoneme composed of a consonant and a consonant following the consonant, or a phoneme composed of a vowel and a vowel following the vowel is assumed. The first phoneme in the phoneme is exemplified as the first phoneme, and the phoneme following the first phoneme is exemplified as the second phoneme.
(2)前述の形態では、距離を測定可能な距離画像センサを第1センサ13として例示したが、距離を測定する機能は第1センサ13において必須ではない。例えば、第1センサ13として画像センサを利用してもよい。第1検出部31は、画像センサが撮像した画像を解析することで手Hの移動量を算定し、当該移動量から距離Pを推定してもよい。また、手Hの画像を撮像する機能も第1センサ13において必須ではない。例えば、赤外光を出射する赤外線センサを第1センサ13として利用してもよい。赤外線センサを第1センサ13として利用する構成において、第1センサ13は、手Hで反射した赤外光を受光した受光強度から、手Hと第1センサ13との間の距離を特定する。そして、第1検出部31は、手Hと第1センサ13と間の距離が所定の閾値を下回る場合に手Hが特定状態にあると判定し、当該距離が閾値を上回る場合に手Hが特定状態にないと判定する。すなわち、手Hが特定状態にあるか否かを判定する処理において、距離Pを算出することは必須ではない。手Hと第1センサ13との間の距離は、手Hと打撃面Fとの間の距離Pと、打撃面Fと第1センサ13との間の距離とを加算した値に相当する。手Hと打撃面Fとの間の距離Pが特定距離Pzにある場合に、手Hと第1センサ13との間の距離が特定の距離になるから、以上の構成においても距離Pが特定距離Pzにあることが特定状態であるといえる。なお、第1センサ13に第1検出部31の機能を搭載してもよい。第1センサ13は、特定状態を検出した場合に、発音制御部214に特定音韻の子音の発音を指示する。 (2) In the above-described embodiment, the distance image sensor capable of measuring the distance is illustrated as the first sensor 13, but the function of measuring the distance is not essential in the first sensor 13. For example, an image sensor may be used as the first sensor 13. The first detection unit 31 may calculate the movement amount of the hand H by analyzing the image captured by the image sensor, and may estimate the distance P from the movement amount. Further, the function of capturing the image of the hand H is not essential in the first sensor 13. For example, an infrared sensor that emits infrared light may be used as the first sensor 13. In the configuration in which the infrared sensor is used as the first sensor 13, the first sensor 13 specifies the distance between the hand H and the first sensor 13 from the light receiving intensity received by the infrared light reflected by the hand H. Then, the first detection unit 31 determines that the hand H is in a specific state when the distance between the hand H and the first sensor 13 is less than a predetermined threshold value, and when the distance exceeds the threshold value, the hand H is moved. Judge that it is not in a specific state. That is, it is not essential to calculate the distance P in the process of determining whether or not the hand H is in a specific state. The distance between the hand H and the first sensor 13 corresponds to the sum of the distance P between the hand H and the striking surface F and the distance between the striking surface F and the first sensor 13. When the distance P between the hand H and the striking surface F is at the specific distance Pz, the distance between the hand H and the first sensor 13 becomes a specific distance, so that the distance P is also specified in the above configuration. It can be said that being at a distance Pz is a specific state. The function of the first detection unit 31 may be mounted on the first sensor 13. When the first sensor 13 detects a specific state, the first sensor 13 instructs the sound control unit 214 to pronounce a consonant having a specific phoneme.
(3)前述の形態では、音信号D2を解析することで打撃面Fに対する打撃を検出したが、打撃を検出する方法は以上の例示に限定されない。例えば、第1センサ13が生成する画像データD1を解析することで手Hが打撃面Fを打撃したことを検出してもよい。例えば、第2検出部32は、手Hが打撃面Fに接触したと画像データD1から推定された場合には、当該手Hが打撃面Fを打撃したと判定する。 (3) In the above-described embodiment, the impact on the striking surface F is detected by analyzing the sound signal D2, but the method for detecting the impact is not limited to the above examples. For example, it may be detected that the hand H hits the hitting surface F by analyzing the image data D1 generated by the first sensor 13. For example, when it is estimated from the image data D1 that the hand H has come into contact with the striking surface F, the second detection unit 32 determines that the hand H has hit the striking surface F.
 手Hが打撃面Fを打撃したときの振動を検知する振動センサを第2センサ15として利用してもよい。第2センサ15は、例えば振動の大きさに応じた信号を生成する。第2検出部32は当該信号に応じて打撃を検出する。また、手Hが打撃面Fに接触したときに打撃面Fに付与される圧力を検知する圧力センサを第2センサ15として利用してもよい。第2センサ15は、例えば打撃面Fに付与される圧力の大きさに応じた信号を生成する。第2検出部32は当該信号に応じて打撃を検出する。なお、第2センサ15に第2検出部32の機能を搭載してもよい。第2センサ15は、打撃面Fに対する打撃を検出したら、発音制御部214に特定音韻の母音の発音を指示する。 A vibration sensor that detects vibration when the hand H hits the striking surface F may be used as the second sensor 15. The second sensor 15 generates a signal according to, for example, the magnitude of vibration. The second detection unit 32 detects the impact in response to the signal. Further, a pressure sensor that detects the pressure applied to the striking surface F when the hand H comes into contact with the striking surface F may be used as the second sensor 15. The second sensor 15 generates a signal according to, for example, the magnitude of the pressure applied to the striking surface F. The second detection unit 32 detects the impact in response to the signal. The second sensor 15 may be equipped with the function of the second detection unit 32. When the second sensor 15 detects a hit on the hitting surface F, the second sensor 15 instructs the sound control unit 214 to pronounce a vowel of a specific phoneme.
(4)前述の形態では、筐体112の内部空間に第1センサ13および第2センサ15を収容したが、第1センサ13および第2センサ15を設置する位置は任意である。例えば、筐体112部の外部に第1センサ13および第2センサ15を設置してもよい。なお、第1センサ13を筐体112の外部に設置する構成では、操作受付部11において筐体112の上面を光透過性の部材で形成することは必須ではない。 (4) In the above-described embodiment, the first sensor 13 and the second sensor 15 are housed in the internal space of the housing 112, but the positions where the first sensor 13 and the second sensor 15 are installed are arbitrary. For example, the first sensor 13 and the second sensor 15 may be installed outside the housing 112. In the configuration in which the first sensor 13 is installed outside the housing 112, it is not essential that the upper surface of the housing 112 is formed of a light-transmitting member in the operation receiving unit 11.
(5)前述の形態では、手Hで打撃面Fを打撃したが、打撃面Fを打撃する物体は手Hに限定されない。打撃面Fを打撃することが可能であれば、物体の種類は任意である。例えばスティック等の打撃部材を物体としてもよい。利用者は、スティックを打撃面Fに向けて移動させて当該打撃面Fを打撃する。以上の説明から理解される通り、物体には、利用者の身体の一部(典型的には手H)と、利用者により操作される打撃部材との双方が包含される。なお、スティック等の打撃部材を物体とする構成においては、当該部材に第1センサ13または第2センサ15を搭載してもよい。 (5) In the above-described embodiment, the striking surface F is hit by the hand H, but the object that hits the striking surface F is not limited to the hand H. The type of the object is arbitrary as long as it is possible to hit the striking surface F. For example, a striking member such as a stick may be an object. The user moves the stick toward the striking surface F to strike the striking surface F. As understood from the above description, the object includes both a part of the user's body (typically the hand H) and a striking member operated by the user. In the configuration in which a striking member such as a stick is used as an object, the first sensor 13 or the second sensor 15 may be mounted on the member.
(6)前述の形態では、手Hと打撃面Fとの間の距離Pが特定距離Pzにあることを特定状態として例示したが、特定状態は以上の例示に限定されない。打撃面Fに向けて移動している途中における手Hの状態であれば、特定状態は任意である。例えば、手Hの移動方向が変化したことを特定状態としてもよい。具体的には、例えば打撃面Fから離れる方向から近づく方向に手Hの移動方向が変化すること、または、打撃面Fに対して水平な方向から垂直な方向に手Hの移動方向が変化することが特定状態として例示される。また、手Hの形状が変化(例えばグーからパーに変化)したことを特定状態としてもよい。 (6) In the above-described embodiment, the fact that the distance P between the hand H and the striking surface F is at the specific distance Pz is illustrated as a specific state, but the specific state is not limited to the above examples. The specific state is arbitrary as long as the hand H is in the middle of moving toward the striking surface F. For example, the change in the moving direction of the hand H may be set as a specific state. Specifically, for example, the moving direction of the hand H changes from the direction away from the striking surface F to the approaching direction, or the moving direction of the hand H changes from the direction horizontal to the direction perpendicular to the striking surface F. Is exemplified as a specific state. Further, the change in the shape of the hand H (for example, change from goo to par) may be set as a specific state.
(7)特定音韻の子音の継続長は、当該子音の種類に応じて異なる。例えば、特定音韻「sa(さ)」における子音「s」の発音に要する時間長は250ms程度であり、特定音韻「ka(か)」における子音「k」の発音に要する時間長は30ms程度である。すなわち、特定音韻の子音の種類に応じて妥当な特定距離Pzは相違する。そこで、第1閾値を特定音韻の子音の種類に応じて可変に設定する構成も採用される。具体的には、音韻Syが特定音韻であると判定された場合に、第1検出部31は、音韻特定部212の子音の種類に応じて第1閾値を設定する。そして、第1検出部31は、設定後の第1閾値と距離Pとを比較することで、手Hが特定状態にあるか否かを判定する。 (7) The continuation length of a consonant of a specific phonology differs depending on the type of the consonant. For example, the time length required to pronounce the consonant "s" in the specific phoneme "sa" is about 250 ms, and the time length required to pronounce the consonant "k" in the specific phoneme "ka" is about 30 ms. is there. That is, the appropriate specific distance Pz differs depending on the type of consonant of the specific phonology. Therefore, a configuration in which the first threshold value is variably set according to the type of consonant of a specific phoneme is also adopted. Specifically, when it is determined that the phoneme Sy is a specific phoneme, the first detection unit 31 sets the first threshold value according to the type of consonant of the phoneme specifying unit 212. Then, the first detection unit 31 determines whether or not the hand H is in a specific state by comparing the set first threshold value with the distance P.
(8)前述の形態では、筐体112と光透過部114とで操作受付部11を構成したが、操作受付部11は以上の例示に限定されない。例えば、第1センサ13および第2センサ15を操作受付部11の外部に設置する構成では、平板状の部材を操作受付部11としてもよい。また、鍵盤型の操作子を操作受付部11としてもよい。鍵盤型の操作子を操作受付部11とする構成では、合成データSの各音符について音高Sxを指定しなくてもよい。利用者は、操作受付部11に対する操作で、各音符の発音開始点を指示するとともに、当該音符の音高を指示する。すなわち、利用者からの指示に応じて各音符の音高を設定してもよい。なお、操作受付部11の形状に関わらず、当該操作受付部11において利用者が打撃の際に接触する面が打撃面Fに相当する。 (8) In the above-described embodiment, the operation reception unit 11 is composed of the housing 112 and the light transmission unit 114, but the operation reception unit 11 is not limited to the above examples. For example, in a configuration in which the first sensor 13 and the second sensor 15 are installed outside the operation reception unit 11, a flat plate-shaped member may be used as the operation reception unit 11. Further, the keyboard-type operator may be used as the operation reception unit 11. In the configuration in which the keyboard-type operator is the operation reception unit 11, it is not necessary to specify the pitch Sx for each note of the composite data S. The user instructs the sounding start point of each note and the pitch of the note by operating the operation reception unit 11. That is, the pitch of each note may be set according to an instruction from the user. Regardless of the shape of the operation reception unit 11, the surface of the operation reception unit 11 that the user comes into contact with when hitting corresponds to the hitting surface F.
(9)前述の形態において、利用者が手Hで打撃面Fを打撃する際に、利用者の手Hの状態を検出し、検出結果に応じて発音を制御してもよい。例えば、検出結果に応じて音符の条件(例えば音高、音韻または継続長)が設定される。すなわち、合成データSの各音符について音高Sxおよび音韻Syを設定することは必須ではない。利用者の手Hの状態は、例えば、手Hの移動速度、手Hの移動方向または手Hの形状等である。検出される手Hの状態と音符の条件との組合せは任意である。利用者は、手Hを打撃面Fに打撃する動作において、手Hの状態を変化させることで音符の条件を指示することが可能である。以下、利用者の手Hの状態に応じて発音を制御する具体的な構成を例示する。 (9) In the above-described embodiment, when the user hits the striking surface F with the hand H, the state of the user's hand H may be detected and the pronunciation may be controlled according to the detection result. For example, note conditions (for example, pitch, phoneme or continuation length) are set according to the detection result. That is, it is not essential to set the pitch Sx and the phoneme Sy for each note of the composite data S. The state of the user's hand H is, for example, the moving speed of the hand H, the moving direction of the hand H, the shape of the hand H, or the like. The combination of the detected hand H state and the note condition is arbitrary. In the action of striking the hand H against the striking surface F, the user can instruct the condition of the note by changing the state of the hand H. Hereinafter, a specific configuration for controlling pronunciation according to the state of the user's hand H will be illustrated.
A.手Hの移動速度
 例えば、手Hの移動速度に応じて音韻の種類(すなわち発音内容)を設定してもよい。具体的には、第1検出部31は、画像データD1から手Hの移動速度を検出する。画像データD1から特定された距離Pの時間変化から移動速度が検出される。なお、第1検出部31は、例えば、速度を検知する速度センサからの出力を利用して手Hの移動速度を検出してもよい。そして、音韻特定部212は、移動速度に応じて特定音韻の種類を設定する。音韻特定部212は、手Hが特定状態になる前に特定音韻の種類を設定する。図5は、手Hの移動速度と特定音韻の種類との関係を表す模式図である。図5には、手H1の移動速度が速い場合に設定される特定音韻と、手H2の移動速度が遅い場合に設定される特定音韻とが図示されている。例えば、手H1の移動速度が速い場合には、継続長が短い子音(例えば[t])を含む特定音韻(例えば「ta(た)」)に設定され、手H2の移動速度が遅い場合には、継続長が長い子音(例えば[s])を含む特定音韻(例えば「sa(さ)」)に設定される。移動速度に関わらず、距離Pが特定距離Pzになる到達時点t1に子音の発音が開始され、打撃時点t2に母音の発音が開始される。手H1の移動速度が速い場合には、手H2の移動速度が遅い場合と比較して、到達時点t1から打撃時点t2にいたるまでの時間長が短いから、子音の継続長が短い特定音韻が設定される。また、手Hの移動速度に応じて音符の継続長または音高を設定してもよい。なお、以上の例示では特定音韻の種類を設定する場合を例示したが、特定音韻以外の音韻の種類を移動速度に応じて制御してもよい。
A. Movement speed of hand H For example, the type of phoneme (that is, pronunciation content) may be set according to the movement speed of hand H. Specifically, the first detection unit 31 detects the moving speed of the hand H from the image data D1. The moving speed is detected from the time change of the distance P specified from the image data D1. The first detection unit 31 may detect the moving speed of the hand H by using, for example, the output from the speed sensor that detects the speed. Then, the phoneme specifying unit 212 sets the type of the specific phoneme according to the moving speed. The phoneme specifying unit 212 sets the type of the specific phoneme before the hand H is in the specific state. FIG. 5 is a schematic diagram showing the relationship between the moving speed of the hand H and the type of the specific phoneme. FIG. 5 illustrates a specific phoneme set when the moving speed of the hand H1 is fast and a specific phoneme set when the moving speed of the hand H2 is slow. For example, when the moving speed of the hand H1 is fast, it is set to a specific phoneme (for example, "ta (ta)") including a consonant (for example, [t]) having a short duration, and when the moving speed of the hand H2 is slow. Is set to a specific phoneme (eg, "sa") that includes a consonant with a long duration (eg, [s]). Regardless of the moving speed, the consonant is started to be pronounced at the arrival time t1 when the distance P becomes the specific distance Pz, and the vowel is started to be pronounced at the hit time t2. When the moving speed of the hand H1 is fast, the time length from the arrival point t1 to the striking time t2 is shorter than when the moving speed of the hand H2 is slow. Set. Further, the continuous length or pitch of the note may be set according to the moving speed of the hand H. In the above examples, the case of setting the specific phoneme type is illustrated, but the phoneme type other than the specific phoneme may be controlled according to the moving speed.
B.手Hの移動方向
 例えば、手Hの移動方向に応じて音韻の種類を設定してもよい。利用者は、所望する音韻に応じて相異なる方向から手Hを移動させて打撃面Fを打撃する。利用者は、打撃面Fに対して多様な方向から手Hを移動させて打撃面Fを打撃することが可能である。例えば、利用者からみて右方向または左方向から手Hを移動させて打撃面Fを打撃する場合、または、利用者から離れる方向または近づく方向に手Hを移動させて打撃面Fを打撃する場合等が想定される。具体的には、第1検出部31は、画像データD1から手Hの移動方向を検出し、音韻特定部212は、当該移動方向に応じて音韻の種類を設定する。音韻特定部212は、手Hが特定状態になる前に音韻の種類を設定する。なお、手Hの移動方向に応じて音符の継続長または音高を設定してもよい。
B. Moving direction of hand H For example, the type of phoneme may be set according to the moving direction of hand H. The user hits the hitting surface F by moving the hand H from different directions according to the desired phoneme. The user can hit the hitting surface F by moving the hand H from various directions with respect to the hitting surface F. For example, when the hand H is moved from the right or left direction when viewed from the user to hit the striking surface F, or when the hand H is moved in the direction away from or closer to the user to hit the striking surface F. Etc. are assumed. Specifically, the first detection unit 31 detects the moving direction of the hand H from the image data D1, and the phoneme specifying unit 212 sets the type of phoneme according to the moving direction. The phoneme specifying unit 212 sets the phoneme type before the hand H is in a specific state. The continuous length or pitch of the note may be set according to the moving direction of the hand H.
C.手Hの形状
 例えば、手Hの形状に応じて音韻の種類を設定してもよい。利用者は、例えば指を動かすことで手Hを任意の形状にした状態で、打撃面Fを打撃する。例えば、グー、チョキまたはパーの形状になるように手Hを動かす。図6は、手Hの形状と音韻との関係とを表す表である。図6に例示される通り、手Hの形状に加えて、手Hが右手および左手の何れあるかを加味して音韻の種類を設定してもよい。手Hの状態には、利用者の手Hが右手および左手の何れであるかも含まれる。第1検出部31は、手Hが右手および左手の何れであるかと、手Hの形状とを画像データD1から検出する。手Hが右手および左手の何れであるかと、手Hの形状との検出には、公知の画像解析技術が任意に採用される。なお、音韻特定部212は、手Hが特定状態になる前に音韻の種類を設定する。音韻特定部212は、右手/左手および手Hの形状に応じて、音韻を特定する。図6に例示される通り、例えば左手によりグーの形状をした状態で打撃面Fを打撃すると、音韻「ta」が発音される。なお、手Hの形状に応じて音符の継続長または音高を設定してもよい。
C. Shape of hand H For example, the type of phoneme may be set according to the shape of hand H. The user hits the striking surface F with the hand H having an arbitrary shape by, for example, moving a finger. For example, move the hand H so that it has a goo, choki, or par shape. FIG. 6 is a table showing the relationship between the shape of the hand H and the phonology. As illustrated in FIG. 6, in addition to the shape of the hand H, the type of phoneme may be set in consideration of whether the hand H is the right hand or the left hand. The state of the hand H also includes whether the user's hand H is the right hand or the left hand. The first detection unit 31 detects whether the hand H is the right hand or the left hand and the shape of the hand H from the image data D1. A known image analysis technique is arbitrarily adopted for detecting whether the hand H is the right hand or the left hand and the shape of the hand H. The phoneme specifying unit 212 sets the phoneme type before the hand H is in a specific state. The phoneme specifying unit 212 specifies the phoneme according to the shapes of the right / left hand and the hand H. As illustrated in FIG. 6, for example, when the striking surface F is struck with the left hand in the shape of a goo, the phoneme "ta" is pronounced. The continuous length or pitch of the note may be set according to the shape of the hand H.
 以上の説明から理解される通り、手Hの移動速度、手Hの移動方向、および、手Hの形状の少なくとも1つが検出され、検出の内容に応じて音韻の発音が制御される。特定音韻の発音を制御する場合には、子音(第1音の例示)および母音(第2音の例示)の少なくとも一方における発音が制御されればよい。以上の構成によれば、物体の移動速度、移動方向および形状を利用者が変更することで、第1音および第2音の発音を制御することができる。なお、手Hの状態は、手Hの移動速度、手Hの移動方向、および、手Hの形状には限定されない。例えば、手Hの移動角度(打撃面Fに対して手Hが移動する角度)を手Hの状態としてもよい。 As understood from the above explanation, at least one of the moving speed of the hand H, the moving direction of the hand H, and the shape of the hand H is detected, and the pronunciation of the phoneme is controlled according to the content of the detection. When controlling the pronunciation of a specific phoneme, it suffices to control the pronunciation of at least one of a consonant (example of the first sound) and a vowel (example of the second sound). According to the above configuration, the user can control the pronunciation of the first sound and the second sound by changing the moving speed, moving direction, and shape of the object. The state of the hand H is not limited to the moving speed of the hand H, the moving direction of the hand H, and the shape of the hand H. For example, the moving angle of the hand H (the angle at which the hand H moves with respect to the striking surface F) may be set as the state of the hand H.
(10)手Hが特定距離Pzに到達してから打撃面Fを打撃するまでの時間長は、手Hの移動速度が遅いと長くなり、手Hの移動速度が速いと短くなる。したがって、第1閾値が手Hの移動速度にかかわらず一定(固定値)である構成では、手Hの移動速度に応じて特定音韻の子音の継続長が変化するという問題ある。具体的には、手Hの移動速度が遅いと子音の継続長が長くなり、手Hの移動速度が速いと子音の継続長が短くなる。そこで、手Hの移動速度に応じて第1閾値を変化させてもよい。具体的には、第1検出部31は、例えば画像データD1から手Hの移動速度を検出する。なお、手Hが特定状態になる前に移動速度が検出される。次に、第1検出部31は、手Hの移動速度に応じて第1閾値を設定する。具体的には、第1検出部31は、手Hの移動速度が速いときは第1閾値を相対的に大きく設定し、手Hの移動速度が遅いときは第1閾値を相対的に小さく設定する。そして、第1検出部31は、設定後の第1閾値と距離Pとを比較して、距離Pが特定距離Pzに到達したか否かを判定する。以上の構成によれば、手Hの移動速度に応じて子音の継続長が変化することを低減できる。 (10) The time length from when the hand H reaches the specific distance Pz to when the striking surface F is hit becomes longer when the moving speed of the hand H is slow, and becomes shorter when the moving speed of the hand H is fast. Therefore, in the configuration in which the first threshold value is constant (fixed value) regardless of the moving speed of the hand H, there is a problem that the continuous length of the consonant of the specific phoneme changes according to the moving speed of the hand H. Specifically, when the moving speed of the hand H is slow, the continuous length of the consonant becomes long, and when the moving speed of the hand H is fast, the continuous length of the consonant becomes short. Therefore, the first threshold value may be changed according to the moving speed of the hand H. Specifically, the first detection unit 31 detects the moving speed of the hand H from, for example, the image data D1. The moving speed is detected before the hand H is in a specific state. Next, the first detection unit 31 sets the first threshold value according to the moving speed of the hand H. Specifically, the first detection unit 31 sets the first threshold value relatively large when the moving speed of the hand H is fast, and sets the first threshold value relatively small when the moving speed of the hand H is slow. To do. Then, the first detection unit 31 compares the set first threshold value with the distance P, and determines whether or not the distance P has reached the specific distance Pz. According to the above configuration, it is possible to reduce the change in the continuous length of the consonant according to the moving speed of the hand H.
 また、手Hの移動方向に応じて第1閾値を変化させてもよい。具体的には、第1検出部31は、例えば画像データD1から手Hの移動方向を検出する。なお、手Hが特定状態になる前に移動方向が検出される。次に、第1検出部31は、手Hの移動方向に応じて第1閾値を設定する。例えば、第1検出部31は、手Hの移動方向が第1方向である場合には、第1閾値を第1値に設定し、手Hの移動方向が第1方向とは異なる第2方向である場合には、第1閾値を第1値よりも大きい第2値に設定する。そして、第1検出部31は、設定後の第1閾値と距離Pとを比較して、距離Pが特定距離Pzに到達したか否かを判定する。手Hの移動速度が一定である場合には、特定音韻の子音の継続長は、第1閾値に応じて変化する。具体的には、特定音韻の子音の継続長は、第1閾値が大きいほど長くなり、第1閾値が小さいほど短くなる。利用者は、特定音韻の子音の継続長を長くしたい場合には、第2方向から打撃面Fを打撃する。他方、利用者は、特定音韻の子音の継続長を短くしたい場合には、第1方向から打撃面Fを打撃する。以上の説明から理解される通り、第1閾値を可変に設定してもよい。 Further, the first threshold value may be changed according to the moving direction of the hand H. Specifically, the first detection unit 31 detects the moving direction of the hand H from, for example, the image data D1. The moving direction is detected before the hand H is in a specific state. Next, the first detection unit 31 sets the first threshold value according to the moving direction of the hand H. For example, when the moving direction of the hand H is the first direction, the first detection unit 31 sets the first threshold value to the first value, and the moving direction of the hand H is different from the first direction in the second direction. If, the first threshold value is set to a second value larger than the first value. Then, the first detection unit 31 compares the set first threshold value with the distance P, and determines whether or not the distance P has reached the specific distance Pz. When the moving speed of the hand H is constant, the continuation length of the consonant of the specific phoneme changes according to the first threshold value. Specifically, the continuation length of a consonant of a specific phonology becomes longer as the first threshold value becomes larger, and becomes shorter as the first threshold value becomes smaller. When the user wants to lengthen the continuous length of the consonant of the specific phoneme, he / she hits the striking surface F from the second direction. On the other hand, when the user wants to shorten the continuation length of the consonant of the specific phoneme, he / she strikes the striking surface F from the first direction. As understood from the above description, the first threshold value may be set variably.
(11)前述の形態において、音韻の発音を終了する時点を利用者による手Hの動作に応じて制御してもよい。例えば、打撃面Fを打撃した後に手Hが当該打撃面Fから離れた時点に音韻の発音を終了してもよい。図7は、変形例に係る検出部213の構成を例示するブロック図である。検出部213は、第1検出部31および第2検出部32に加えて第3検出部33を具備する。第3検出部33は、打撃面Fから手Hが離れたことを検出する。例えば、画像データD1の解析により打撃面Fから手Hが離れたことが検出される。なお、第3検出部33は、打撃面Fに付与される圧力を検知する圧力センサからの出力を利用して、打撃面Fから手Hが離れたことを検出してもよい。発音制御部214は、第3検出が打撃面Fから手Hが離れたことを検出すると、音韻の発音を終了する。 (11) In the above-described embodiment, the time point at which the pronunciation of the phoneme is finished may be controlled according to the movement of the hand H by the user. For example, the pronunciation of the phoneme may be terminated when the hand H separates from the striking surface F after striking the striking surface F. FIG. 7 is a block diagram illustrating the configuration of the detection unit 213 according to the modified example. The detection unit 213 includes a third detection unit 33 in addition to the first detection unit 31 and the second detection unit 32. The third detection unit 33 detects that the hand H is separated from the striking surface F. For example, the analysis of the image data D1 detects that the hand H is separated from the striking surface F. The third detection unit 33 may detect that the hand H is separated from the striking surface F by using the output from the pressure sensor that detects the pressure applied to the striking surface F. When the third detection detects that the hand H is separated from the striking surface F, the pronunciation control unit 214 ends the pronunciation of the phoneme.
(12)前述の形態では、利用者の手Hで打撃面Fを打撃したが、例えば触覚フィードバックを利用した触角技術(ハプティクス)を利用して利用者が仮想的な打撃面Fを打撃する構成も採用される。利用者は、表示装置に表示された仮想空間内における擬似的な手を操作可能な操作子を操作することで、当該仮想空間内に用意された打撃面Fを打撃する。仮想空間内の打撃面Fを打撃したときに振動する振動モーターを操作子に搭載することで、利用者は実際に打撃面Fを打撃しているように知覚する。仮想空間内の手が特定状態にある場合には特定音韻の子音が発音され、仮想空間内において打撃面Fが打撃された場合に当該特定音韻の母音が発音される。以上の説明から理解される通り、打撃面Fは仮想空間内における面でもよい。同様に、手Hも仮想空間内における手でもよい。 (12) In the above-described embodiment, the striking surface F is hit by the user's hand H, but for example, the user hits the virtual striking surface F by using the antennae technology (haptics) using tactile feedback. Is also adopted. The user strikes the striking surface F prepared in the virtual space by operating an operator capable of operating a pseudo hand in the virtual space displayed on the display device. By mounting a vibration motor that vibrates when hitting the hitting surface F in the virtual space on the operator, the user perceives that the hitting surface F is actually hit. When the hand in the virtual space is in a specific state, a consonant of a specific phoneme is pronounced, and when the striking surface F is hit in the virtual space, a vowel of the specific phoneme is sounded. As understood from the above description, the striking surface F may be a surface in the virtual space. Similarly, the hand H may be a hand in the virtual space.
(13)以上に例示した発音制御装置20の機能は、前述の通り、制御装置21を構成する単数または複数のプロセッサと記憶装置23に記憶されたプログラムとの協働により実現される。本開示に係るプログラムは、コンピュータが読取可能な記録媒体に格納された形態で提供されてコンピュータにインストールされ得る。記録媒体は、例えば非一過性(non-transitory)の記録媒体であり、CD-ROM等の光学式記録媒体(光ディスク)が好例であるが、半導体記録媒体または磁気記録媒体等の公知の任意の形式の記録媒体も包含される。なお、非一過性の記録媒体とは、一過性の伝搬信号(transitory, propagating signal)を除く任意の記録媒体を含み、揮発性の記録媒体も除外されない。また、配信装置が通信網を介してプログラムを配信する構成では、当該配信装置においてプログラムを記憶する記憶装置23が、前述の非一過性の記録媒体に相当する。 (13) As described above, the function of the sound control device 20 illustrated above is realized by the cooperation of one or more processors constituting the control device 21 and the program stored in the storage device 23. The program according to the present disclosure may be provided and installed on a computer in a form stored in a computer-readable recording medium. The recording medium is, for example, a non-transitory recording medium, and an optical recording medium (optical disc) such as a CD-ROM is a good example, but a known arbitrary such as a semiconductor recording medium or a magnetic recording medium. Recording media in the format of are also included. The non-transient recording medium includes any recording medium other than the transient propagation signal (transitory, propagating signal), and the volatile recording medium is not excluded. Further, in the configuration in which the distribution device distributes the program via the communication network, the storage device 23 that stores the program in the distribution device corresponds to the above-mentioned non-transient recording medium.
<付記>
 以上に例示した形態から、例えば以下の構成が把握される。
<Additional notes>
From the above-exemplified form, for example, the following configuration can be grasped.
 本開示のひとつの態様(態様1)に係る発音制御方法は、物体が面に向けて移動している途中において物体が特定の状態にあることを検出し、前記特定の状態が検出されたときに第1音を発音させ、前記物体の移動により前記物体が前記面を打撃した打撃イベントを検出し、前記打撃イベントが検出されたときに第2音を発音させる。以上の態様では、物体が面に向けて移動している途中において特定の状態になると第1音が発音され、物体が当該面を打撃すると第2音が発音される。したがって、物体が面に打撃する前に第1音を発音できる。また、面に対する物体の打撃を検出することで、第2音が発音されるから、第2音を発音するための操作感を維持したまま、第2音よりも前に第1音を発音できる。 The sound control method according to one aspect (aspect 1) of the present disclosure detects that an object is in a specific state while the object is moving toward a surface, and when the specific state is detected. Is made to sound the first sound, the hitting event in which the object hits the surface due to the movement of the object is detected, and the second sound is sounded when the hitting event is detected. In the above aspect, the first sound is produced when the object reaches a specific state while moving toward the surface, and the second sound is produced when the object hits the surface. Therefore, the first sound can be pronounced before the object hits the surface. Further, since the second sound is pronounced by detecting the impact of the object on the surface, the first sound can be sounded before the second sound while maintaining the operation feeling for sounding the second sound. ..
 態様1の一例(態様2)では、前記第1音は、第1音素であり、前記第2音は、前記第1音素とは異なる第2音素である。以上の態様では、物体が特定の状態になると第1音素が発音され、物体が面を打撃すると第1音素に後続して第2音素が発音される。したがって、物体が打撃する前に第1音素を発音させることができる。 In one example of aspect 1 (aspect 2), the first sound is a first phoneme, and the second sound is a second phoneme different from the first phoneme. In the above aspect, when the object is in a specific state, the first phoneme is pronounced, and when the object hits the surface, the second phoneme is pronounced following the first phoneme. Therefore, the first phoneme can be pronounced before the object hits.
 態様2の一例(態様3)では、前記第1音素は、子音であり、前記第2音素は、母音である。以上の態様では、物体が特定の状態になると子音が発音され、物体が面を打撃すると子音に後続して母音が発音される。したがって、子音と母音とで構成される音韻の発音が遅延していると知覚されることを低減できる。 In an example of aspect 2 (aspect 3), the first phoneme is a consonant and the second phoneme is a vowel. In the above aspects, a consonant is pronounced when the object is in a specific state, and a vowel is pronounced following the consonant when the object hits the surface. Therefore, it is possible to reduce the perception that the pronunciation of a phoneme composed of consonants and vowels is delayed.
 態様1の一例(態様4)では、前記第1音は、発音の準備動作に関する音であり、前記第2音は、前記準備動作に後続する音である。以上の態様では、物体が特定の状態になると準備動作に関する音が発音され、物体が面を打撃すると準備動作に後続する音が発音される。したがって、目的とする音の前に当該音を発音させるための準備動作に関する音を発音できる。 In one example of aspect 1 (aspect 4), the first sound is a sound related to the preparatory movement for pronunciation, and the second sound is a sound following the preparatory movement. In the above aspect, when the object is in a specific state, a sound related to the preparatory movement is sounded, and when the object hits the surface, a sound following the preparatory movement is sounded. Therefore, it is possible to pronounce the sound related to the preparatory movement for making the sound sound before the target sound.
 態様1から態様4の何れかの一例(態様5)では、前記特定の状態は、前記物体と前記面との間の距離が特定の距離にあることである。以上の態様では、物体と面との間の距離が特定の距離になると第1音が発音される。すなわち、物体が面に向かい移動するまでの途中の状態で第1音が発音される。したがって、第1音を発音させるための操作を利用者が意識することなく当該第1音を発音することができる。 In one example of any one of aspects 1 to 4 (aspect 5), the specific state is that the distance between the object and the surface is at a specific distance. In the above aspect, the first sound is produced when the distance between the object and the surface becomes a specific distance. That is, the first sound is pronounced in the middle of the movement of the object toward the surface. Therefore, the first sound can be pronounced without the user being aware of the operation for pronouncing the first sound.
 態様1から態様5の何れかの一例(態様6)では、前記物体の打撃の検出においては、収音装置が収音により生成した音信号を解析することで前記打撃を検出する。以上の態様では、収音装置が収音により生成した音信号を解析することで面に対する物体の打撃が検出される。したがって、面に対する打撃により発生する打撃音を第2音の発音に利用することができる。 In any one of the first to fifth aspects (aspect 6), in the detection of the impact of the object, the impact is detected by analyzing the sound signal generated by the sound collecting device. In the above aspect, the impact of the object on the surface is detected by analyzing the sound signal generated by the sound collecting device. Therefore, the striking sound generated by striking the surface can be used for the pronunciation of the second sound.
 態様1から態様6の何れかの一例(態様7)では、前記物体の移動速度、前記物体の移動方向、および、前記物体の形状の少なくとも1つを検出し、前記検出の内容に応じて、前記第1音および前記第2音の少なくとも一方における発音を制御する。以上の態様では、物体が移動する速度、物体が移動する方向、および、物体の形状の少なくとも1つに応じて、第1音および第2音の少なくとも一方における発音が制御される。したがって、物体の移動速度、移動方向および形状を利用者が変更することで、第1音および第2音の発音を制御することができる。 In any one example of Aspects 1 to 6 (Aspect 7), at least one of the moving speed of the object, the moving direction of the object, and the shape of the object is detected, and depending on the content of the detection, Controls the pronunciation of at least one of the first sound and the second sound. In the above aspect, the pronunciation in at least one of the first sound and the second sound is controlled according to the speed at which the object moves, the direction in which the object moves, and at least one of the shapes of the objects. Therefore, the pronunciation of the first sound and the second sound can be controlled by the user changing the moving speed, moving direction, and shape of the object.
 本開示のひとつの態様(態様1)に係る発音制御装置は、物体が面に向けて移動している途中において物体が特定の状態にあること、および、前記物体の移動により前記物体が前記面を打撃した打撃イベントを検出する検出部と、前記特定の状態が検出されたときに第1音を発音させ、前記打撃イベントが検出されたときに第2音を発音させる発音制御部とを具備する。 In the sound control device according to one aspect (aspect 1) of the present disclosure, the object is in a specific state while the object is moving toward the surface, and the movement of the object causes the object to move to the surface. It is provided with a detection unit that detects a striking event that hits, and a sound control unit that sounds the first sound when the specific state is detected and sounds the second sound when the hit event is detected. To do.
 本出願は、2019年9月26日出願の日本出願(特願2019-175253)に基づくものであり、その内容はここに参照として取り込まれる。 This application is based on the Japanese application filed on September 26, 2019 (Japanese Patent Application No. 2019-175253), the contents of which are incorporated herein by reference.
 本開示の発音制御方法および発音制御装置は、利用者の指等の物体が鍵等の面に接触する前に発音を開始させることができる。 The pronunciation control method and the pronunciation control device of the present disclosure can start pronunciation before an object such as a user's finger comes into contact with a surface such as a key.
 100…発音制御システム
 10…操作ユニット
 11…操作受付部
 112…筐体
 114…光透過部
 13…第1センサ
 15…第2センサ
 20…発音制御装置
 21…制御装置
 212…音韻特定部
 213…検出部
 214…発音制御部
 23…記憶装置
 25…放音装置
 31…第1検出部
 32…第2検出部
 33…第3検出部
 F…打撃面
100 ... Sound control system 10 ... Operation unit 11 ... Operation reception unit 112 ... Housing 114 ... Light transmission unit 13 ... First sensor 15 ... Second sensor 20 ... Sound control device 21 ... Control device 212 ... Sound identification unit 213 ... Detection Unit 214 ... Sound control unit 23 ... Storage device 25 ... Sound emitting device 31 ... First detection unit 32 ... Second detection unit 33 ... Third detection unit F ... Strike surface

Claims (9)

  1.  物体が面に向けて移動している途中において前記物体が特定の状態にあることを検出し、
     前記特定の状態が検出されたときに第1音を発音させ、
     前記物体の移動により前記物体が前記面を打撃した打撃イベントを検出し、
     前記打撃イベントが検出されたときに第2音を発音させる
     コンピュータにより実現される発音制御方法。
    Detecting that an object is in a specific state while it is moving toward a surface,
    When the specific state is detected, the first sound is sounded.
    A striking event in which the object hits the surface due to the movement of the object is detected.
    A sound control method realized by a computer that sounds a second sound when the hit event is detected.
  2.  前記第1音は、第1音素であり、
     前記第2音は、前記第1音素とは異なる第2音素である
     請求項1に記載の発音制御方法。
    The first sound is the first phoneme and
    The pronunciation control method according to claim 1, wherein the second sound is a second phoneme different from the first phoneme.
  3.  前記第1音素は、子音であり、
     前記第2音素は、母音である
     請求項2に記載の発音制御方法。
    The first phoneme is a consonant and
    The pronunciation control method according to claim 2, wherein the second phoneme is a vowel.
  4.  前記第1音は、発音の準備動作に関する音であり、
     前記第2音は、前記準備動作に後続する音である
     請求項1に記載の発音制御方法。
    The first sound is a sound related to the preparatory movement for pronunciation.
    The pronunciation control method according to claim 1, wherein the second sound is a sound following the preparatory operation.
  5.  前記特定の状態は、前記物体と前記面との間の距離が特定の距離にあることである
     請求項1から請求項4の何れかに記載の発音制御方法。
    The pronunciation control method according to any one of claims 1 to 4, wherein the specific state is that the distance between the object and the surface is at a specific distance.
  6.  前記特定の距離は、前記第1音素の子音の種類に応じて変更される
     請求項5に記載の発音制御方法。
    The pronunciation control method according to claim 5, wherein the specific distance is changed according to the type of consonant of the first phoneme.
  7.  前記物体の打撃イベントの検出においては、収音装置によって収音された前記打撃の音を表す音信号を解析することで前記打撃イベントを検出する
     請求項1から請求項6の何れかに記載の発音制御方法。
    The hitting event of the object is detected according to any one of claims 1 to 6, wherein the hitting event is detected by analyzing a sound signal representing the hitting sound picked up by the sound collecting device. Sound control method.
  8.  前記物体の移動速度、前記物体の移動方向、および前記物体の形状の少なくとも1つが表す特徴データを検出し、
     前記検出された特徴データに応じて、前記第1音の発音および前記第2音の発音の少なくとも一方を制御する
     請求項1から請求項7の何れかに記載の発音制御方法。
    The feature data represented by at least one of the moving speed of the object, the moving direction of the object, and the shape of the object is detected.
    The pronunciation control method according to any one of claims 1 to 7, wherein at least one of the pronunciation of the first sound and the pronunciation of the second sound is controlled according to the detected feature data.
  9.  物体が面に向けて移動している途中において前記物体が特定の状態にあること、および、前記物体の移動により前記物体が前記面を打撃した打撃イベントを検出する検出部と、
     前記特定の状態が検出されたときに第1音を発音させ、前記打撃イベントが検出されたときに第2音を発音させる発音制御部と
     を具備する発音制御装置。
    A detection unit that detects that the object is in a specific state while the object is moving toward the surface, and that the object hits the surface due to the movement of the object.
    A sound control device including a sound control unit that sounds a first sound when the specific state is detected and sounds a second sound when the striking event is detected.
PCT/JP2020/035785 2019-09-26 2020-09-23 Sound output control method and sound output control device WO2021060273A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2019-175253 2019-09-26
JP2019175253A JP7380008B2 (en) 2019-09-26 2019-09-26 Pronunciation control method and pronunciation control device

Publications (1)

Publication Number Publication Date
WO2021060273A1 true WO2021060273A1 (en) 2021-04-01

Family

ID=75157779

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/035785 WO2021060273A1 (en) 2019-09-26 2020-09-23 Sound output control method and sound output control device

Country Status (2)

Country Link
JP (1) JP7380008B2 (en)
WO (1) WO2021060273A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022255052A1 (en) * 2021-06-03 2022-12-08 ヤマハ株式会社 Percussion system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004061753A (en) * 2002-07-26 2004-02-26 Yamaha Corp Method and device for synthesizing singing voice
JP2014098801A (en) * 2012-11-14 2014-05-29 Yamaha Corp Voice synthesizing apparatus
JP2014186307A (en) * 2013-02-22 2014-10-02 Yamaha Corp Voice synthesis device
JP2017146555A (en) * 2016-02-19 2017-08-24 ヤマハ株式会社 Performance support device and method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004061753A (en) * 2002-07-26 2004-02-26 Yamaha Corp Method and device for synthesizing singing voice
JP2014098801A (en) * 2012-11-14 2014-05-29 Yamaha Corp Voice synthesizing apparatus
JP2014186307A (en) * 2013-02-22 2014-10-02 Yamaha Corp Voice synthesis device
JP2017146555A (en) * 2016-02-19 2017-08-24 ヤマハ株式会社 Performance support device and method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022255052A1 (en) * 2021-06-03 2022-12-08 ヤマハ株式会社 Percussion system

Also Published As

Publication number Publication date
JP2021051249A (en) 2021-04-01
JP7380008B2 (en) 2023-11-15

Similar Documents

Publication Publication Date Title
US10490181B2 (en) Technology for responding to remarks using speech synthesis
JP6140579B2 (en) Sound processing apparatus, sound processing method, and sound processing program
JP5821824B2 (en) Speech synthesizer
JP4457983B2 (en) Performance operation assistance device and program
JP5162938B2 (en) Musical sound generator and keyboard instrument
US20020011143A1 (en) Musical score display for musical performance apparatus
JP2002268699A (en) Device and method for voice synthesis, program, and recording medium
US8785761B2 (en) Sound-generation controlling apparatus, a method of controlling the sound-generation controlling apparatus, and a program recording medium
JP7367641B2 (en) Electronic musical instruments, methods and programs
JP5040778B2 (en) Speech synthesis apparatus, method and program
JP2016080827A (en) Phoneme information synthesis device and voice synthesis device
CN112466266A (en) Control system and control method
WO2021060273A1 (en) Sound output control method and sound output control device
JP4654513B2 (en) Musical instrument
JP5151401B2 (en) Audio processing device
JP2007256412A (en) Musical sound controller
Li et al. Acoustic and articulatory analysis on Mandarin Chinese vowels in emotional speech
JP7024864B2 (en) Signal processing equipment, programs and sound sources
JP2017146555A (en) Performance support device and method
JP4244338B2 (en) SOUND OUTPUT CONTROL DEVICE, MUSIC REPRODUCTION DEVICE, SOUND OUTPUT CONTROL METHOD, PROGRAM THEREOF, AND RECORDING MEDIUM CONTAINING THE PROGRAM
JP6090043B2 (en) Information processing apparatus and program
JP2017146557A (en) Performance support device and method
JP3584585B2 (en) Electronic musical instrument
JP4544258B2 (en) Acoustic conversion device and program
JP2002304187A (en) Device and method for synthesizing voice, program and recording medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20867292

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20867292

Country of ref document: EP

Kind code of ref document: A1