WO2001097209A1 - Terminal, procede de reproduction vocale de reference et support de stockage - Google Patents

Terminal, procede de reproduction vocale de reference et support de stockage Download PDF

Info

Publication number
WO2001097209A1
WO2001097209A1 PCT/JP2001/004911 JP0104911W WO0197209A1 WO 2001097209 A1 WO2001097209 A1 WO 2001097209A1 JP 0104911 W JP0104911 W JP 0104911W WO 0197209 A1 WO0197209 A1 WO 0197209A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
data
performance data
guide
unit
Prior art date
Application number
PCT/JP2001/004911
Other languages
English (en)
Japanese (ja)
Inventor
Akitoshi Saito
Original Assignee
Yamaha Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yamaha Corporation filed Critical Yamaha Corporation
Priority to KR10-2002-7016964A priority Critical patent/KR100530916B1/ko
Priority to AU2001264240A priority patent/AU2001264240A1/en
Publication of WO2001097209A1 publication Critical patent/WO2001097209A1/fr
Priority to HK03106757.7A priority patent/HK1054460A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B1/00Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission
    • H04B1/38Transceivers, i.e. devices in which transmitter and receiver form a structural unit and in which at least one part is used for functions of transmitting and receiving
    • H04B1/40Circuits
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0033Recording/reproducing or transmission of music for electrophonic musical instruments
    • G10H1/0041Recording/reproducing or transmission of music for electrophonic musical instruments in coded form
    • G10H1/0058Transmission between separate instruments or between individual components of a musical system
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/171Transmission of musical instrument data, control or status information; Transmission, remote access or control of music data for electrophonic musical instruments
    • G10H2240/201Physical layer or hardware aspects of transmission to or from an electrophonic musical instrument, e.g. voltage levels, bit streams, code words or symbols over a physical link connecting network nodes or instruments
    • G10H2240/241Telephone transmission, i.e. using twisted pair telephone lines or any type of telephone network
    • G10H2240/251Mobile telephone transmission, i.e. transmitting, accessing or controlling music data wirelessly via a wireless or mobile telephone receiver, analogue or digital, e.g. DECT, GSM, UMTS

Definitions

  • the present invention is suitably applicable to a karaoke device, a mobile phone, and the like, and a terminal device and a guide audio reproduction method capable of performing karaoke by distributing content data, and a professional device for executing the method. It is related to the storage medium in which the program is stored. Background art
  • the occupied frequency bandwidth is low.
  • the data transmission rate is low and the bit rate is low, so that voice signals for speech communication are transmitted after being compressed and encoded with high efficiency.
  • an analysis-synthesis coding method using a speech synthesis model composed of a sound source model and a vocal tract model is known, and this analysis-synthesis coding method includes MPC (Multi-Pulse Excited).
  • LPC Multi-Pulse Excited
  • CELP Code Excited LPC
  • a karaoke system that enables the user to perform karaoke by reproducing the karaoke music from the distributed karaoke data, that is, to sing along with the reproduced karaoke music.
  • Such a karaoke system is generally called a communication karaoke and can be used at home.
  • a karaoke system that distributes karaoke data is known. In such a karaoke system, the requested music data, the guide lyrics data displayed as a visual prompt on the screen, and the image data as the background, if necessary, are used. It is distributed as karaoke data. The user sings along with the reproduced musical sound while watching the guide lyrics (visual prompts) reproduced from the distributed guide lyrics data and displayed on the screen. .
  • a communication karaoke system described in Japanese Patent Application Laid-Open No. 11-167392 is proposed as a communication karaoke system for solving this problem.
  • This communication karaoke system transmits karaoke data with lyric data for reading out when distributing karaoke data consisting of song data, background image data, and lyric display data for guide lyrics.
  • the karaoke device that receives the data reproduces the karaoke music from the song data and guides the user on the display that displays the background image based on the background image data in accordance with the progress of the karaoke music.
  • the lyrics data is displayed in the lyrics data, and the synthesized speech corresponding to the information of the accent, sound intensity, and pitch (voice quality) included in the lyrics data for reading is included in the lyrics data.
  • the output is synthesized according to the reading time information, so that the user can listen to the synthesized lyrics without looking at the display. Ru can and the child that is responsible for La orchestra It will be.
  • the lyric data for reading must be read before being sung, and the speech is synthesized at the pitch and intonation corresponding to the melody to be sung so that it can be easily sung when heard. There is a need. For this reason, the lyric data for reading out speech must include information on the synthesized speech accent, sound intensity, pitch (voice quality), and reading time information. There was a problem that information on the sound, the strength of the sound, the pitch (voice quality), and the reading time had to be prepared by analyzing the music for each song.
  • the digital cellular system is considered to have a low bit rate transmission speed, and the transmission capacity is limited. The transmission took a long time and the communication charges were high.
  • the user requests a song title and distributes the karaoke data of the song, but if the transmission takes a long time, the request is made. After a long time had passed since the song was ready to be played, there was a risk of losing interest in karaoke.
  • the mobile phone must be equipped with voice synthesis means for synthesizing voice from the lyrics data for reading out, which makes the mobile phone expensive and the space of the voice synthesis means increases. There was a problem that mobile phones could not be miniaturized.
  • the present invention has been made in view of such circumstances, and a terminal device and a guide sound reproducing method that do not need to generate pitch and guidance information of guide sound, and A program that executes the method Its primary purpose is to provide storage media.
  • the present invention provides a terminal device which can distribute karaoke data in a short time even if the transmission speed is low, and does not include a dedicated voice synthesizing means for reproducing a guide voice. It is a second object of the present invention to provide a guide audio reproduction method and a storage medium storing a program for executing the method. Disclosure of the invention
  • performance data composed of a sequence of performance events and a voice symbol for each syllable in the lyrics attached to the performance data are provided.
  • a terminal device is a terminal device comprising: a tone synthesizer that reproduces a tone from the performance data; a voice synthesizer that synthesizes a guide voice based on the voice symbol data; Controlling the voice synthesizing unit, thereby comprising a voice synthesizing control unit that changes the properties of the guide voice synthesized by the voice synthesizing unit in accordance with the performance data.
  • the properties of the guide voice synthesized by the voice synthesis unit can be changed according to the performance data. Since it can be changed, there is no need to create a guide voice generation timing and pitch and instantiation information. This eliminates the need for a melody analysis for each song and the creation of the synthesized speech accent, sound intensity, pitch (voice quality), and reading time information. Can be.
  • the pitch and guidance information of the guide voice are included in the distribution data. Since there is no need to download, the amount of data to be distributed can be reduced. In addition, by pre-reading and analyzing the performance data, it is possible to control the utterance timing of the guide voice, thereby further reducing the amount of data to be distributed. it can. .
  • the performance data is MIDI-formatted performance data, and the voice symbol data may be inserted into the performance data as an exclusive message. I like it.
  • the terminal device further includes an analyzing unit for analyzing the performance data of the political line out of the performance data, and the speech synthesis control unit performs the speech synthesis in accordance with an analysis result of the analysis unit.
  • the speech synthesis control unit performs the speech synthesis in accordance with an analysis result of the analysis unit.
  • the voice synthesis control unit controls a synthesis timing of the voice synthesis unit in accordance with an analysis result of the analysis unit, and thereby, a voice synthesized by the voice synthesis unit. It is preferred that the audio be spoken before the relevant vocal line.
  • the speech synthesis control unit provides the speech synthesis unit with speech parameters read from a speech database in accordance with the speech symbol data and a result of the analysis by the analysis unit.
  • the syllables of the guide voice synthesized by the voice synthesizer follow the voice symbol data, and the pitch and the instantiation of the guide voice change according to the vocal line. It is preferable to do so.
  • performance data composed of a sequence of performance events, and a voice symbol for each syllable in the lyrics attached to the performance data
  • a terminal device is provided to which content data composed of voice symbol data composed of:
  • a terminal device includes a telephone function unit that enables a call, a tone synthesis unit that reproduces a tone from the performance data, and a guide voice based on the voice symbol data. It is characterized by comprising a voice synthesizing unit for synthesizing and decoding the voice data for the call.
  • a new voice is synthesized by using a voice synthesizer for decoding voice data provided in a mobile telephone of a digital cellular system. It is no longer necessary to provide a speech synthesizer for each. As a result, even if the guide sound is output, a new storage space is not required, and downsizing can be maintained. Further, since the voice synthesizing unit can be used, the rise in cost can be suppressed.
  • the terminal device reads the performance data in advance and controls the voice synthesis unit, thereby changing the property of the guide voice synthesized by the voice synthesis unit in accordance with the performance data. It is preferable to further provide a speech synthesis control unit for making the speech synthesis control.
  • the terminal device includes: an analysis unit that analyzes the performance data of the po- lar line out of the performance data; and controls the speech synthesis unit in accordance with the analysis result of the analysis unit.
  • the voice synthesizing unit may further include a voice synthesizing control unit that changes the pitch of a guide voice synthesized by the voice synthesizing unit according to the vocal line. Equally preferred.
  • performance data composed of a sequence of performance events and a voice symbol for each syllable in the lyrics attached to the performance data are provided.
  • a guide voice reproduction method is provided using a terminal device to which content data composed of different voice symbol data is distributed.
  • a guide sound reproducing method comprising: To reproduce a musical tone, synthesize a guide voice based on the voice symbol data, pre-read the performance data, and change the properties of the synthesized guide voice in accordance with the performance data. And.
  • a storage medium storing a program for causing a computer to execute the above-mentioned guided voice reproducing method.
  • FIG. 1 is a diagram showing a configuration example of a mobile phone to which the terminal device according to the first embodiment of the present invention is applied, together with a base station.
  • FIG. 2 is a diagram showing a detailed configuration of a voice compression / synthesis unit and a database in the telephone function unit of the mobile phone of FIG.
  • FIG. 3 is a diagram showing the flow of processing of performance data together with the function block diagram of the processing section of the telephone function section shown in FIG.
  • Fig. 4 is a diagram showing the structure of karaoke data used in the mobile phone of Fig. 1.
  • Fig. 5 is a conceptual diagram of downloading karaoke data to the mobile phone of Fig. 1.
  • FIG. 6 is a diagram showing a configuration example of a power location device to which the terminal device according to the second embodiment of the present invention is applied, together with a distribution center.
  • FIG. 7 is a diagram showing a detailed configuration of a speech synthesis unit and a database in the control unit of the karaoke device of FIG. BEST MODE FOR CARRYING OUT THE INVENTION
  • FIG. 1 shows a mobile phone to which the terminal device according to the first embodiment of the present invention is applied.
  • An example of the configuration of a talker and a base station are shown.
  • 1 is a mobile phone according to the present invention
  • 2 is a base station that manages each wireless zone.
  • a digital cellular system a small zone system is generally adopted, and a large number of wireless zones are arranged in a service area.
  • the base station 2 manages each of these wireless zones.
  • the mobile phone 1 includes an antenna 10 which is generally retractable, and the antenna 10 is connected to the transceiver 11.
  • the transceiver unit 11 demodulates a signal received by the antenna 10 and modulates a signal to be transmitted to supply the modulated signal to the antenna 10.
  • the telephone function section 12 is a processing means for making the mobile phone 1 function as a telephone when talking with another telephone, and a CELP encoder function and a decoder function for coping with highly efficient compression of voice.
  • the voice compression / synthesis unit 22 having In this case, the audio parameters read from the database 24 are supplied to the audio compression / synthesis unit 22, and the audio parameters are used by using the decoder function of the audio compression / synthesis unit 22. Voices can be synthesized according to the meter. In other words, the voice compression / synthesis unit 22 can function as voice synthesis means.
  • the database 24 stores speech parameters from "A" to "N" and onomatopoeic sounds.
  • the voice signal input from the microphone 21 is subjected to high-efficiency compression encoding by the encoder function of the voice compression / synthesis unit 22 of the telephone function unit 12, and is modulated by the transmission / reception unit 11.
  • the high-efficiency compression-encoded audio data received by the antenna 10 is demodulated by the transceiver 11 and converted to the original audio signal by the audio compression / synthesis unit 22 of the telephone function unit 12.
  • the data is decoded and output from the output unit 20 composed of speed and the like. In this way, during a call, a signal is transmitted or received via the transceiver unit 11 and the telephone function unit 12.
  • the storage means 13 is a memory for temporarily storing distributed karaoke data as described later.
  • the karaoke data is composed of performance data composed of a sequence of performance events of the requested song and voice symbol data composed of voice symbols for each syllable in the lyrics attached to the performance data.
  • the karaoke data may include guide lyrics data for displaying lyrics on a display.
  • the karaoke data is MIDI-formatted data as shown in Fig. 4, and the lyric voice symbol data is an exclusive message as shown in Fig. 4 in the MIDI data. Has been inserted.
  • the data amount of one piece of karaoke data can be reduced to a small amount, and even in a digital cellular system with a low bit rate transmission speed, it can be done in a short time.
  • One song's karaoke data can be transmitted.
  • the data separation section 14 has a built-in MIDI decoder, interprets the MIDI data read from the storage means 13 and separates the data into performance data and voice symbol data.
  • the separated performance data is supplied to a tone synthesizer 16 comprising a sequencer and a MIDI sound source via a buffer memory (Buff) 15 which operates as a delay circuit.
  • the separated voice symbol data is supplied to the telephone function unit 12 together with the performance data.
  • a guided voice synthesized based on the voice symbol data is output from the voice compression and synthesis unit 22. This guide voice guides the lyrics when singing in the karaoke by replacing them with the guide lyric images displayed on the display, and plays the karaoke played by the musical sound synthesizer 16.
  • the output timing of the guide voice is set before the timing of singing the lyrics, and the lyrics having a predetermined phrase length are synthesized and output as the guide voice.
  • This guide voice is synthesized at a fast tempo with melody according to the performance data and with addition of accents and instantiations.
  • the telephone function 12 analyzes the performance data of the vocal lines (segments of the vocal force report) in the performance data. For example, by analyzing the key change (melody) of the performance data of the baud force line, the manner of change in the pitch of the guide voice is controlled, and musical symbols such as slurs and stackers are used. By analyzing the vocal line velocity information and pronunciation length (gate time) information that reflects the sound, the guidance sound's instantiation and accent are controlled.
  • the change in the key of the vocal line performance data is analyzed to determine whether the phrase is a female part or a male part.
  • the pitch may be determined so that the guide voice of the phrase is a female voice or a male voice.
  • the voice symbol data supplied to the telephone function unit 12 is given to the database 24, and the voice indicated by the voice symbol data is synthesized by the voice compression synthesis unit 22 for each syllable.
  • the audio parameters are read from the database 24 so as to be executed. These voice parameters are supplied to the voice compression / synthesis unit 22. These voice parameters are read from the database 24 by controlling based on the analysis results of the performance data of the above-mentioned vocal line, so that the melody, velocity and pronunciation of the vocal line are obtained. It reflects the head. As a result, the pitch, the accent and the instantiation of the guide voice synthesized by the voice compression and synthesis unit 22 can be adjusted by the vocal. The change can be controlled according to the line.
  • the guide voice is output before the musical tone is reproduced based on the performance data portion after the performance data of the corresponding portion is pre-read and analyzed. That is, the reproduction of the musical tone based on the performance data is performed later than the guide voice.
  • This delay is realized by the knock memory 15, and the performance data delayed by the predetermined time by the knock memory 15 is supplied to the musical tone synthesizer 16.
  • the guide voice synthesized by the voice compression / synthesis unit 22 is consequently output to the output unit 2 prior to the musical sound reproduced by the musical sound synthesis unit 16. It will be output from 0.
  • the tone synthesizer 16 is composed of a sequencer and a MIDI source, and the tone reproduced by the tone synthesizer 16 is sent to the effect unit 17 to be added with an effect.
  • the musical tone to which the effect has been added is synthesized in the synthesizing section 18 with the synthesized guide voice.
  • the effect is added to this guide voice by the effect unit 23 before it is synthesized with the musical sound.
  • the musical sound and the guide voice synthesized by the synthesizing unit 18 are amplified by the amplifying unit 19 and output from the output unit 20.
  • the effect units 17 and 23 for example, localization control according to the number of speakers of the output unit 20 is performed. Further, effects such as reverb and chorus may be added.
  • the synthesized guide speech is corrected by an equalizer. May be used. Further, the volume of the guide voice may be made variable. By doing so, the volume of the guide voice can be reduced with the skill of the singer.
  • FIG. 2 shows a detailed configuration of the voice compression / synthesis unit 22 and the database 24 in the telephone function unit 12 of the mobile phone 1 in FIG.
  • the speech compression / synthesis unit 22 shown in FIG. 2 includes a typical CELP decoder that decodes speech data obtained by encoding speech information with high efficiency. However, although not shown, the speech compression / synthesis unit 22 also has a CELP-based encoder that can compress and encode speech information with high efficiency.
  • the features of speech include the pitch L and noise component (called “source speech feature parameter J”) of the source speech generated from the vocal cords, the throat, and the mouth. It can be represented by the vocal tract transfer characteristics when the sound passes and the radiation characteristics of the lips (called “vocal tract feature parameters”). That is, a vocal fold model that generates the source voice and a vocal tract model cascaded to the vocal fold model can represent a speech synthesis model.
  • the CELP decoder in the speech compression / synthesis unit 22 shown in FIG. 2 synthesizes speech based on this speech synthesis model, thereby decoding the compressed and encoded speech data into the original speech.
  • the compressed voice data for each frame input to the voice compression / synthesis unit 22 is processed by the data processing unit 30 into the voice parameters of index I, pitch L, and reflection coefficient y.
  • the pitch L parameter is in the short-term oscillator 32
  • the index I parameter is in the codebook 31
  • the reflection coefficient parameter is The data is distributed to the approximate throat filter 34.
  • the codebook 31 has the same contents as the source sound codebook in the encoder, and the contents are recorded in R0M (Read Only Memory).
  • the short-term oscillation section 32 Based on the parameters of the pitch L, the short-term oscillation section 32 generates a decoded signal of the pitch L voice and supplies it to the source waveform reproducing section 33 ⁇ the source waveform reproducing section 3 3 is supplied with the code vector data indicated by the index I read from the code book 31 and converts this data into a decoded signal of the pitch L audio. By combining, the combined source waveform is reproduced in the source waveform reproducing unit 33.
  • the synthesized source waveform output from the source waveform reproducing unit 33 is similar to the waveform generated by the vibration of the human vocal cords. Since the filter coefficient is controlled by the parameter of the reflection coefficient y, the filter is processed by the approximate filter 34 to obtain a synthesized speech.
  • the throat approximation filter 34 reproduces the transfer function of the human throat and mouth, stores the reflection coefficient y supplied from the data processing unit 30 in advance, and stores it when necessary. It is supplied to each filter.
  • the synthesized speech output from the throat approximation filter 34 is supplied to the spectral filter 35, and is output after removing unnaturalness as speech.
  • the high-efficiency compression-encoded compressed voice data which is a speech signal, is decoded and output by the voice compression / synthesis unit 22.
  • the voice symbol data separated by the data separating section 1 4, database 2 4 sound data base - is supplied to the scan 4 0, for synthesizing guide voice indicated by the supplied audio sheet Nborudeta
  • the pitch parameter, waveform selection parameter and reflection coefficient parameter are output from the audio database 40.
  • the outputted pitch parameter of pitch Lg is supplied to the short-term oscillator 32, and the decoded signal of the pitch Lg voice is generated from the short-term oscillator 32.
  • the c waveform selection parameter supplied to the source waveform reproducing unit 33 is supplied to the waveform database 41, and the waveform data giving the voice type is read from the waveform database 41 and the source waveform reproducing unit Output to 3 3.
  • the decoded signal having the pitch Lg and the waveform data giving the voice type are synthesized, and the synthesized source waveform is reproduced.
  • the synthesized source waveform output from the source waveform reproducing unit 33 is the reflection source yg read out from the reflection coefficient changing database 42 to which the reflection supply parameters are supplied from the audio database 40.
  • the filter coefficient is controlled by the parameter, but the filter is processed in the approximate filter 34, and the guided voice is synthesized.
  • the synthesized speech output from the throat approximation filter 34 is supplied to the spectral filter 35 to remove unnaturalness as speech. Is output as a guide sound.
  • a control signal is supplied to the database 24.
  • the control signal is a signal that controls the pitch Lg of the guide voice and the variation of the pitch Lg, and also controls the guidance function of the guide voice.
  • the control signal is information of an analysis result obtained by analyzing the performance data of the po- lar line in the performance data by the processing unit built in the telephone function unit 12. If the oscillation frequency of the short-term oscillator 32 is changed by controlling the pitch parameter Lg by a control signal, the guided voice to be synthesized can be changed to female voice or male voice. It can be. Also, by changing the waveform data read from the waveform database 41, the voice type of the guide voice can be changed.
  • the guidance function can be changed in the guidance function. Can be.
  • the control signal is created by analyzing the performance data of the vocal line
  • the pitch of the guide voice and the invitation function are added to the vocal line. It can change according to the media style. This allows the user to understand the key and how to sing by listening to the guide voice before singing.
  • the database 24 is supplied with time information Time, which is information indicating the utterance timing and tempo of the guide voice, and a waveform database according to the time information Time. A predetermined waveform is read from 41, and a parameter of a predetermined reflection coefficient ag is read from the reflection coefficient changing database 42.
  • the time information Time is output at the timing before the timing at which the lyrics to be guided are sung by analyzing the vocabulary performance data. Furthermore, the length of each syllable of the guided voice is controlled by the time information Time, and the speed at which the guided voice is output is controlled. You.
  • FIG. 3 shows a configuration in which the analysis process is represented in a hardware manner.
  • MIDI data which is distributed karaoke data
  • the MIDI data read from the storage means 13 by the data separation section 14 is interpreted by the data separation section 14 having a MIDI decoder function and separated into performance data and voice symbol data.
  • the MIDI data has voice symbol data, which is a voice symbol string for guided voice, inserted into the MIDI data as an ex- clusive message.
  • Exclusive messages are represented by the message part between status 'notes' "F0" and "F7" in the MIDI data as shown in Figure 4.
  • This exclusive message is composed of a voice symbol sequence of a guide voice which is set for each phrase, and may include timing information for producing the guide voice of the phrase. .
  • the voice symbol data separated by the data separation unit 14 is supplied to a guide voice height / speed determination unit 36, and the pitch and speed (tempo) of the guide voice are determined.
  • the guide voice can be changed to female voice or male voice by the user specifying the pitch.
  • the guide voice height / speed determination unit 36 receives pitch information and tempo information from the vocal line analysis unit 38, as well as intonation information and acceleration information. It is supplied via the switch SW.
  • the height of the guide voice 'The speed determination unit 36 provides various types of voices that are supplied to make the guide voice of the specified pitch as well as the guide voice according to the vocal line.
  • a control signal corresponding to the information is output together with the voice symbol data. The control signal is used to control the pitch, speed (tempo), intonation, and accent of the guide voice.
  • the guide sound is set to the default pitch. If it is unnatural if the melody of the guide voice is tuned according to the vocal line,
  • W can be turned off. In this case, it is a flat guide sound.
  • the performance data separated in the data separation section 14 is supplied to a po- line line analysis section 38 in the telephone function section 12, and vocabulary line performance data is analyzed.
  • the separated performance data is delayed by the buffer memory 15 and supplied to the tone synthesis section 16.
  • the musical sound based on the performance data is reproduced after being delayed from the guide voice, and is also pre-read and analyzed by the vocabulary analyzing unit 38.
  • the vocal line analyzer 38 analyzes velocity information and envelope information that reflects changes in vocal line keys (melody) and musical symbols such as slurs and stackers. .
  • duration information and gate time information are also analyzed.
  • the melody information obtained as a result of the analysis is supplied as pitch control information from the vocal line analysis section 38 to the guide speech height / speed determination section 36.
  • acceleration control information and orientation control information obtained by analyzing the velocity information envelope information are also supplied to the guide voice height / speed determination unit 36.
  • the utterance timing information of the guide voice and the tempo information of the guide voice obtained by analyzing the duration coasting information and the gate time information are used to determine the height of the guide voice. ⁇ Supplied to the speed determination unit 36.
  • the vocal line analysis section 38 also analyzes whether the vocal line to be analyzed is a female part or a male voice part in the case of a duet song.
  • the pitch information according to the analysis result is supplied to the guide voice height / speed determination unit 36.
  • the pitch of the guide voice and the length of each syllable are controlled according to the melody of the vocal line.
  • a female voice is output before a female part
  • a male voice is output before a male part.
  • a guide voice is output at a timing corresponding to the utterance timing information of the guide voice supplied from the vocal line analysis unit 38, and the speed of the guide voice is converted to tempo information. The speed will be adjusted accordingly.
  • the utterance timing information and tempo information of the guide voice are output from the guide voice height / speed determination unit 36 as time information Time. If the voice symbol data includes timing information for producing a guide voice, the guide voice is produced based on the timing information.
  • the control signal output from the guide voice height / speed determination unit 36 is output to the database 24 via the interpolator 37.
  • This interpolator 37 prevents the pitch from changing unnaturally when the pitch of the guide voice is changed according to the melody of the vocal line.
  • the pitch change speed of the guide voice is dynamically changed according to the speed of the melody of the vocal line. As a result, the guide voice is output as smooth voice.
  • the memory 15 is provided for synchronizing the timing at which the vocal line is reproduced with the guide voice, and the utterance tag of the guide voice described above is provided.
  • the timing information is utterance timing information that takes into account the delay time of the buffer memory 15.
  • FIG. 5 is a conceptual diagram of downloading karaoke data to the mobile phone 1a and the mobile phone 1b having the same configuration as the mobile phone 1.
  • a cellular system in a mobile phone adopts a small zone system and a number of wireless zones are arranged in a service area.
  • a base station provided in each wireless zone manages each wireless zone.
  • the wireless zone to which the mobile phone belongs is managed. It is connected to the mobile switching center via the base station that manages the mobile phone, and the mobile switching center is connected to the general telephone network.
  • the mobile telephone can communicate with another telephone by being connected to the base station that manages each wireless zone via a wireless line.
  • the mobile phone is connected to the mobile switching center via a base station that manages the wireless zone to which the mobile phone belongs. It will be possible to make a call with another mobile phone via the base station to which the user belongs.
  • FIG. 5 An example of such a cellular system is shown in FIG. 5, in which the mobile phone 1a belongs to a wireless zone managed by the base station 2c of the base stations 2a to 2d.
  • the case where the mobile phone 1b belongs to the base station 2a is shown.
  • the mobile phone 1a and the base station 2c are connected by a radio line, and an up signal when making a call or performing location registration is received and processed by the base station 2c.
  • the base station to be managed is assumed to be base station 2a, but the same applies to mobile phone 1b.
  • the base stations 2a to 2d manage different radio zones, respectively, but the peripheries of the radio zones may overlap each other.
  • the base stations 2a to 2d are connected to the mobile switching center 3 via a multiplexed line, and a plurality of mobile switching centers 3 are concentrated at the gateway switching center 4 and are connected to the general telephone switching center 5a. Connected to.
  • a plurality of gateway switching stations 4 are connected to each other via a relay transmission line.
  • General telephone exchange 5 a., 5 . are installed in each region, and the general telephone exchanges 5a, 5b, 5c,... are also interconnected by relay transmission lines.
  • Each of the general telephone exchanges 5 a, 5 b, 5 c ⁇ is connected to a large number of general telephones.
  • a distribution center 6 is connected to the general telephone exchange 5 b.
  • a new song is added to the distribution center 6 at any time, and a large amount of karaoke data is stored.
  • the distribution center 6 connected to the general telephone network transmits The karaoke data can be downloaded to the telephones la and 101.
  • the mobile phone 1a downloads the karaoke data
  • the mobile phone 1a transmits the telephone number of the distribution center 6.
  • the mobile phone 1a can request and download the karaoke data of the desired music title.
  • the karaoke data includes voice symbol data of the guide voice.
  • the mobile phone lb can request and download the karaoke data of a desired song name.
  • the distribution center 6 may be connected to the Internet, and the karaoke data may be downloaded from the distribution center 6 via the Internet.
  • the mobile phone 1 can be used for hands-free communication, and when performing karaoke with the hands-free, the output sound from the output unit 20 is a microphone 21 Input may be caused by howling. If the mobile phone 1 can make a hands-free call, the eco-cancel Provide a circuit to prevent howling. Further, the output from the output unit 20 may be transmitted as a weak radio wave by the FM modulator, and may be received and output by the FM receiver installed indoors or in the vehicle. In this case, howling may occur, so an echo canceller circuit should be provided.
  • the transmitting section of the mobile phone 1 is not used except for the request, so the power supply 1 supplied to the transmitting section is turned off except when the request is made. Thereby, the duration of the battery may be improved.
  • FIG. 6 shows a configuration example of a second embodiment in which the terminal device of the present invention is applied to a karaoke device together with a distribution center.
  • This embodiment is basically different from the first embodiment only in the communication function and the display function. That is, the configuration of the modem 11 1 and the control unit 112 corresponding to the transmission / reception function unit 11 and the telephone function unit 12 of the first embodiment is different, and the display unit 126 is added.
  • This is different from the first embodiment applied to the mobile phone, and the other components are functionally the same, so the same reference numerals are given and the detailed description is omitted.
  • reference numeral 100 denotes a karaoke device to which the terminal device according to the second embodiment of the present invention is applied, and karaoke device 100 downloads karaoke data from distribution center 16.
  • the karaoke device 100 and the distribution center 6, which are configured to be able to do this, are connected by a communication line, and the communication line is constituted by a telephone line or the like.
  • the karaoke device 100 is provided with a modem 111 so that desired karaoke data can be downloaded from the distribution center 16 via the modem 111.
  • the modem 111 demodulates the received signal, modulates the signal to be transmitted, and sends it out to the communication line.
  • the control unit 1 1 2 is a display control unit 1 2 5 and a speech synthesis unit 1 2 2 and a control section for controlling each section of the karaoke apparatus 100.
  • the speech parameters read from the database 24 are supplied to the speech synthesis unit 122 to synthesize speech according to the speech parameters. it can.
  • Database 2.4 stores speech parameters from "A" to "N” and onomatopoeic sounds.
  • the storage means 13 is a memory in which the distributed force data is stored, as in the first embodiment.
  • the karaoke data includes performance data composed of a sequence of performance events of the requested song, and voice symbols for each syllable in the lyrics attached to the performance data. It is composed of voice symbol data and guide lyrics display data for displaying guide lyrics on the display unit 126.
  • This guide lyrics display data is supplied from the modem 111 to the control unit 112. Is done.
  • the guide lyrics display data is supplied from the control unit 112 to the display unit 126 sequentially when the performance data is played, and the guide lyrics are displayed on the display unit 126.
  • the background video data suitable for the genre of the performance day is read out from the large-capacity storage means (not shown) and displayed on the display unit 126 together with the guide lyrics.
  • the karaoke data excluding the guide lyric display data is MIDI format data as shown in FIG. 4, and the voice symbol data of the lyrics is shown in FIG. As shown, it is inserted into the MIDI data as an exclusive message. For this reason, the data amount of one song in the gaoke data can be reduced to a small amount except for the guide lyrics display data, and the karaoke data of one tune can be transmitted in a short time. It will be.
  • the audio symbol data separated by the data separation unit 14 is supplied to the control unit 112 together with the performance data.
  • Control unit In 1 12 a guide voice synthesized based on the voice symbol data is output from the voice synthesis section 122.
  • This guide voice is intended to guide the user even if he does not look at the guide lyrics displayed on the display unit 126 when singing in karaoke.
  • the music is synthesized according to the progress of the musical sound of the karaoke and output from the output unit 20.
  • the singing sound input from the microphone 21 is also output from the output unit 20.
  • FIG. 7 shows a detailed configuration of the speech synthesis unit 122 and the database 24 in the control unit 112 of the karaoke apparatus 100 according to the second embodiment of the present invention.
  • the speech synthesis unit 122 shown in FIG. 7 has an encoder unlike the speech compression synthesis unit 22 of the telephone function unit 12 of the mobile phone 1 to which the terminal device of the first embodiment is applied. Absent. The other configuration is the same as that of the voice compression / synthesis unit 22 and will not be described.
  • the performance data of the po- line is analyzed by the processing unit in the control unit 112, and this analysis processing is performed by executing the analysis program by the processing unit. Since the analysis process is the same as that of the mobile phone 1 according to the first embodiment shown in FIG. 3, the description thereof is omitted.
  • the mode when the karaoke device 100 downloads the karaoke data will be described.
  • the karaoke device 100 accesses the distribution center 6 via the modem 111.
  • the karaoke device 100 and the distribution center 6 are connected.
  • the karaoke device 100 requests and downloads the karaoke data of the desired song name by operating an input means (not shown) according to the guidance displayed on the display section 126.
  • the karaoke data includes the voice symbol data of the guide voice and the guide lyrics display data.
  • the storage medium storing the program code of the software that realizes the functions of the above-described embodiments can be used to store the program in an electronic device such as a karaoke device, a mobile phone, or a personal computer.
  • the object of the present invention can be achieved by installing the program on a computer (or CPU) of the electronic device and executing the program.
  • the program code itself installed in the electronic device using the storage medium realizes the new function of the present invention, and the storage medium storing the program code is used. Will constitute the present invention.
  • a storage medium for recording the program code for example, a floppy disk, a hard disk, an optical disk, a magneto-optical disk, a CD-ROM, a CD-R, a magnetic tape, a nonvolatile tape Sex memory cards, ROM, etc. can be used.
  • the program code may be supplied from a server computer via a communication network.
  • the computer executes the readout program code, thereby realizing not only the functions of the above-described embodiment, but also operating on the computer based on the instruction of the program code. Needless to say, there is a case where 0S or the like performs part or all of the actual processing, and the processing realizes the functions of the above-described embodiments.
  • the program code read from the storage medium is written to the memory provided in the function expansion unit connected to the color device, the personal computer, etc., and the function expansion unit connected to them.
  • the CPU provided in the function expansion board or function expansion unit performs part or all of the actual processing, and the processing realizes the functions of the above-described embodiment. Needless to say, this is also included.
  • Industrial advantage S possibility
  • the terminal device of the present invention can be applied to a mobile phone having a communication function, and can be applied to a mobile phone such as a mobile phone or a car phone having a karaoke function. be able to.
  • the terminal device of the present invention can be applied. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Reverberation, Karaoke And Other Acoustics (AREA)
  • Telephonic Communication Services (AREA)
  • Electrophonic Musical Instruments (AREA)

Abstract

L'invention concerne un dispositif évitant la production d'une hauteur tonale de référence et des informations relatives à l'intonation. Des données relatives au contenu, renfermant des données de lecture comprenant des chaînes d'événements de lecture et des données symboliques vocales comprenant des symboles vocaux pour chaque syllabe des mots d'une chanson annexée aux données de lecture, sont distribuées à un terminal. Des sons musicaux sont reproduits à partir des données de lecture. Une voix de référence est synthétisée en fonction des données symboliques vocales. Une propriété de voix de référence de synthétisation est modifiée en fonction des données de lecture par la lecture à l'avance des données de lecture, de manière à commander l'unité de synthèse vocale.
PCT/JP2001/004911 2000-06-12 2001-06-11 Terminal, procede de reproduction vocale de reference et support de stockage WO2001097209A1 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
KR10-2002-7016964A KR100530916B1 (ko) 2000-06-12 2001-06-11 단말 장치, 가이드 음성 재생 방법 및 기억 매체
AU2001264240A AU2001264240A1 (en) 2000-06-12 2001-06-11 Terminal device, guide voice reproducing method and storage medium
HK03106757.7A HK1054460A1 (zh) 2000-06-12 2003-09-20 終端裝置、引導聲音再現方法和存儲介質

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2000-175358 2000-06-12
JP2000175358A JP2001356784A (ja) 2000-06-12 2000-06-12 端末装置

Publications (1)

Publication Number Publication Date
WO2001097209A1 true WO2001097209A1 (fr) 2001-12-20

Family

ID=18677250

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2001/004911 WO2001097209A1 (fr) 2000-06-12 2001-06-11 Terminal, procede de reproduction vocale de reference et support de stockage

Country Status (7)

Country Link
JP (1) JP2001356784A (fr)
KR (1) KR100530916B1 (fr)
CN (1) CN100461262C (fr)
AU (1) AU2001264240A1 (fr)
HK (1) HK1054460A1 (fr)
TW (1) TW529018B (fr)
WO (1) WO2001097209A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110622240A (zh) * 2017-05-24 2019-12-27 日本放送协会 语音向导生成装置、语音向导生成方法及广播系统

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4236533B2 (ja) * 2003-07-18 2009-03-11 クリムゾンテクノロジー株式会社 楽音発生装置及びそのプログラム
JP4305084B2 (ja) * 2003-07-18 2009-07-29 ブラザー工業株式会社 音楽再生装置
JP4999497B2 (ja) * 2007-02-28 2012-08-15 株式会社第一興商 パート歌唱補助機能を備える車載用カラオケシステム
JP2010054530A (ja) * 2008-08-26 2010-03-11 Sony Corp 情報処理装置、発光制御方法およびコンピュータプログラム
JP2014170373A (ja) * 2013-03-04 2014-09-18 Toshiba Tec Corp 情報処理端末およびそのプログラム
JP6728754B2 (ja) * 2015-03-20 2020-07-22 ヤマハ株式会社 発音装置、発音方法および発音プログラム
KR102290901B1 (ko) * 2021-01-04 2021-08-19 주식회사 디어유 노래 반주 시스템
KR20240059075A (ko) 2022-10-27 2024-05-07 김기범 기계학습 기반 온라인 보컬 분석 서비스 시스템

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08137483A (ja) * 1994-11-10 1996-05-31 Ekushingu:Kk カラオケ装置
JPH0926798A (ja) * 1995-07-12 1997-01-28 Kanda Tsushin Kogyo Co Ltd Phs通信カラオケシステム
JPH09179574A (ja) * 1995-12-27 1997-07-11 Yamaha Corp カラオケ装置
JPH09190196A (ja) * 1995-10-26 1997-07-22 Sony Corp 音声信号の再生方法及び装置、並びに音声復号化方法及び装置、並びに音声合成方法及び装置、並びに携帯無線端末装置
JPH1031496A (ja) * 1996-07-15 1998-02-03 Casio Comput Co Ltd 楽音発生装置
JPH10161683A (ja) * 1996-12-04 1998-06-19 Harness Sogo Gijutsu Kenkyusho:Kk 車載用カラオケ装置

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6431496A (en) * 1987-07-28 1989-02-01 Matsushita Electric Works Ltd Manufacture of ceramics wiring board
JPH11325943A (ja) * 1998-05-11 1999-11-26 Sony Corp ナビゲーションシステム

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08137483A (ja) * 1994-11-10 1996-05-31 Ekushingu:Kk カラオケ装置
JPH0926798A (ja) * 1995-07-12 1997-01-28 Kanda Tsushin Kogyo Co Ltd Phs通信カラオケシステム
JPH09190196A (ja) * 1995-10-26 1997-07-22 Sony Corp 音声信号の再生方法及び装置、並びに音声復号化方法及び装置、並びに音声合成方法及び装置、並びに携帯無線端末装置
JPH09179574A (ja) * 1995-12-27 1997-07-11 Yamaha Corp カラオケ装置
JPH1031496A (ja) * 1996-07-15 1998-02-03 Casio Comput Co Ltd 楽音発生装置
JPH10161683A (ja) * 1996-12-04 1998-06-19 Harness Sogo Gijutsu Kenkyusho:Kk 車載用カラオケ装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110622240A (zh) * 2017-05-24 2019-12-27 日本放送协会 语音向导生成装置、语音向导生成方法及广播系统
CN110622240B (zh) * 2017-05-24 2023-04-14 日本放送协会 语音向导生成装置、语音向导生成方法及广播系统

Also Published As

Publication number Publication date
KR20030010696A (ko) 2003-02-05
CN100461262C (zh) 2009-02-11
CN1436345A (zh) 2003-08-13
AU2001264240A1 (en) 2001-12-24
HK1054460A1 (zh) 2003-11-28
JP2001356784A (ja) 2001-12-26
TW529018B (en) 2003-04-21
KR100530916B1 (ko) 2005-11-23

Similar Documents

Publication Publication Date Title
KR100469215B1 (ko) 전화단말장치
JP3938015B2 (ja) 音声再生装置
WO2001093245A1 (fr) Appareil et procede pour reproduire des compositions musicales, terminal portable et dispositif de memoire
JP2001077931A (ja) 電話端末装置
KR100586365B1 (ko) 휴대 전화기 및 휴대 전화 시스템
JP2001215979A (ja) カラオケ装置
KR100530916B1 (ko) 단말 장치, 가이드 음성 재생 방법 및 기억 매체
JP4420562B2 (ja) 背景ノイズが共存する符号化音声の品質を向上させるためのシステムおよび方法
WO2002097787A1 (fr) Appareil de reproduction de musique et procede et terminal correspondants
WO2002017294A1 (fr) Generateur de son musical, terminal portatif, procede de generation de son musical et support d'enregistrement
KR100521548B1 (ko) 악곡 재생 장치
US7356373B2 (en) Method and device for enhancing ring tones in mobile terminals
KR100619826B1 (ko) 이동 통신 단말기의 음악 및 음성 합성 장치와 방법
KR100617754B1 (ko) 휴대용 디지털 오디오 재생 장치에서 음악 데이터와 함께사운드 효과를 재생하기 위한 장치 및 방법
US20020050207A1 (en) Method and system for delivering music
JP2001034299A (ja) 音声合成装置
JP2003228369A (ja) 音声メロディ楽曲生成装置およびそれを用いた携帯端末装置
JP2007102103A (ja) オーディオデータ再生装置および携帯端末装置
WO2002005433A1 (fr) Procede, dispositif et systeme de compression d'un signal vocal ou musical
KR101206113B1 (ko) 디지털 방송 수신 단말기에서 원음벨 알람을 위한 장치 및방법
KR20080080013A (ko) 휴대 단말 장치
JP4153453B2 (ja) 音楽再生装置
JP2001042867A (ja) 発音制御装置並びにこれを用いた装置及びシステム
TW200427297A (en) Speech and music reproduction apparatus
JP2000347693A (ja) オーディオ符号化復号化システム、符号化装置、復号化装置及びこれらの方法並びに記憶媒体

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AU BA BB BG BR BZ CA CN CO CR CU CZ DM DZ EE GD GE HR HU ID IL IN IS KR LC LK LR LT LV MA MG MK MN MX NO NZ PL RO SG SI SK TR TT UA US UZ VN YU ZA

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 018110347

Country of ref document: CN

Ref document number: 1020027016964

Country of ref document: KR

WWP Wipo information: published in national office

Ref document number: 1020027016964

Country of ref document: KR

122 Ep: pct application non-entry in european phase
WWG Wipo information: grant in national office

Ref document number: 1020027016964

Country of ref document: KR