WO2016157377A1 - Communication system, playback system, terminal device, server, content communication method, and program - Google Patents

Communication system, playback system, terminal device, server, content communication method, and program Download PDF

Info

Publication number
WO2016157377A1
WO2016157377A1 PCT/JP2015/059967 JP2015059967W WO2016157377A1 WO 2016157377 A1 WO2016157377 A1 WO 2016157377A1 JP 2015059967 W JP2015059967 W JP 2015059967W WO 2016157377 A1 WO2016157377 A1 WO 2016157377A1
Authority
WO
WIPO (PCT)
Prior art keywords
music
data
terminal device
server
playback device
Prior art date
Application number
PCT/JP2015/059967
Other languages
French (fr)
Japanese (ja)
Inventor
克仁 石岡
啓太郎 菅原
Original Assignee
パイオニア株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by パイオニア株式会社 filed Critical パイオニア株式会社
Priority to PCT/JP2015/059967 priority Critical patent/WO2016157377A1/en
Publication of WO2016157377A1 publication Critical patent/WO2016157377A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K15/00Acoustics not otherwise provided for
    • G10K15/02Synthesis of acoustic waves
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K15/00Acoustics not otherwise provided for
    • G10K15/04Sound-producing devices

Definitions

  • the present invention relates to a method of outputting lyrics information as music is played.
  • a karaoke apparatus that synthesizes and outputs lyrics data prior to a karaoke performance is known (for example, Patent Documents 1 and 2).
  • lyrics are not included in the music to be played back, so that the lyric sound output by the prior art does not become difficult to hear.
  • the output lyrics will overlap with the lyrics included in the original music. It may be difficult to hear.
  • the lyrics voice output by the prior art method may overlap with the voice message of the route guidance by the in-vehicle navigation device, and it may be difficult to hear.
  • An object of the present invention is to provide an easy-to-listen lyric sound for a user to sing a song while playing music including lyrics.
  • the invention according to claim 1 is a communication system including a server and a terminal device, wherein the server receives from the terminal device a storage unit that stores use permission information that identifies a usable playback device.
  • a determination unit configured to determine whether or not the reproduction device is usable based on the identification information of the reproduction device and the use permission information; and when the determination unit determines that the reproduction device is usable, the terminal Transmitting means for transmitting content data corresponding to information specifying content received from the device to the terminal device, wherein the terminal device obtains identification information of the playback device from the playback device connected to the terminal device.
  • the identification information acquisition means for acquiring and transmitting to the server, the information for specifying the content to be reproduced is transmitted to the server, and the reproduction apparatus is determined to be usable.
  • the invention according to claim 2 is a playback system including a server and a terminal device, wherein the server receives from the terminal device a storage unit that stores use permission information indicating an available music playback device. Based on the identification information of the music playback device and the use permission information, a determination unit that determines whether or not the music playback device can be used, an acquisition unit that acquires lyric data of music, and the determination unit Transmission means for transmitting to the terminal device the lyric data corresponding to the information specifying the reproduced music received from the terminal device when it is determined that the reproducing device is usable; Acquires identification information of the music playback device from the music playback device connected to the terminal device, selects identification information acquisition means for transmitting to the server, and selects a playback music that is to be played back Input means, music data acquisition means for acquiring music data of the selected playback music, and information specifying the selected playback music is transmitted to the server, and it is determined that the music playback device is usable Sometimes the first communication means for receiving the lyrics
  • the invention according to claim 3 is a playback system including a server and a terminal device, wherein the server receives from the terminal device a storage unit that stores use permission information for specifying an available music playback device.
  • Determination means for determining whether or not the music playback apparatus is usable based on the identification information of the music playback apparatus and the use permission information, and acquisition means for acquiring music data of the music and lyrics data of the music Lyric sound data generating means for generating lyric sound data based on the lyric data, and the lyric sound data is added to the tune data so as to precede the lyric part in the tune, and the tune data with lyric sound is added.
  • Lyric sound-added music data generating means for generating the music data, and when the determination means determines that the music playback apparatus is usable, the playback music received from the terminal device is designated.
  • Transmitting means for transmitting the song data with lyrics voice corresponding to the information to be transmitted to the terminal device, wherein the terminal device acquires identification information of the music playback device from the music playback device connected to the terminal device And transmitting the identification information acquisition means to be transmitted to the server, input means for selecting a reproduction music that is a music to be reproduced, and information specifying the selected reproduction music to the server, and the music reproduction device.
  • the first communication means for receiving, from the server, music data with lyrics voice corresponding to the reproduced music when it is determined that the music data can be used, and the music playback for which the music data with lyrics voice is determined to be usable
  • a second communication means for transmitting to the apparatus.
  • the invention according to claim 7 is a terminal device capable of communicating with the server, and obtains identification information of the reproduction device from the reproduction device connected to the terminal device, and transmits the identification information to the server.
  • the invention according to claim 8 is a content communication method executed by a terminal device capable of communicating with a server, acquires identification information of the playback device from a playback device connected to the terminal device, and sends the identification information to the server.
  • the identification information acquisition step to be transmitted, information specifying the content to be played back is sent to the server, and content data corresponding to the content is received from the server when the playback device is determined to be usable.
  • the invention according to claim 9 is a program executed by a terminal device capable of communicating with a server, acquires identification information of the playback device from a playback device connected to the terminal device, and transmits the identification information to the server Identification information acquisition means, first communication means for transmitting information specifying content to be played back to the server and receiving content data corresponding to the content from the server when it is determined that the playback device is usable
  • the terminal device is caused to function as a second communication means for transmitting the content data to the playback device determined to be usable.
  • the invention according to claim 10 is a server capable of communicating with the server and the terminal device, and is connected to the storage device storing use permission information for specifying a usable playback device, and the terminal device is connected to the terminal device.
  • a transmission unit configured to transmit content data corresponding to information specifying the content to the terminal device when the determination unit determines that the reproduction apparatus is usable.
  • the server receives from the terminal device a storage unit that stores use permission information for specifying a usable playback device.
  • a determination unit configured to determine whether or not the reproduction device is usable based on the identification information of the reproduction device and the use permission information; and when the determination unit determines that the reproduction device is usable, the terminal Transmitting means for transmitting content data corresponding to information specifying content received from the device to the terminal device, wherein the terminal device obtains identification information of the playback device from the playback device connected to the terminal device.
  • ID information acquisition means for acquiring and transmitting to the server, and information for specifying the content to be reproduced is transmitted to the server, and the reproduction apparatus is determined to be usable
  • a first communication unit that receives content data corresponding to the content from the server when transmitted, and a second communication unit that transmits the content data to the playback device determined to be usable.
  • the server stores use permission information that identifies usable playback devices.
  • the terminal device acquires identification information of the reproduction device from the reproduction device connected to the terminal device, transmits the identification information to the server, and transmits information specifying the content to be reproduced to the server.
  • the server determines whether or not the playback device can be used based on the playback device identification information and use permission information received from the terminal device, and when the playback device is determined to be available, the server receives the playback device from the terminal device.
  • Content data corresponding to information specifying the content is transmitted to the terminal device.
  • the terminal device transmits the content data received from the server to the playback device determined to be usable. As a result, the content data is transmitted only to the playback device determined to be usable.
  • a preferred embodiment of the present invention is a playback system including a server and a terminal device, and the server receives from the terminal device a storage unit that stores use permission information indicating an available music playback device. Based on the identification information of the music playback device and the use permission information, a determination unit that determines whether or not the music playback device can be used, an acquisition unit that acquires lyric data of music, and the determination unit Transmission means for transmitting to the terminal device the lyric data corresponding to the information specifying the reproduced music received from the terminal device when it is determined that the reproducing device is usable; Acquires identification information of the music playback device from the music playback device connected to the terminal device, selects identification information acquisition means for transmitting to the server, and selects the playback music that is to be played back Input means, music data acquisition means for acquiring music data of the selected playback music, and information specifying the selected playback music is transmitted to the server, and the music playback device is determined to be usable Sometimes the first communication means for receiving the lyrics data
  • the server stores use permission information indicating the music playback devices that can be used.
  • the terminal device acquires identification information of the music playback device from the music playback device connected to the terminal device and transmits the identification information to the server.
  • the server determines whether or not the music playback device can be used based on the music playback device identification information and use permission information received from the terminal device.
  • the playback music that is the music to be played back is selected by the user.
  • the terminal device transmits information specifying the selected reproduction music piece to the server.
  • the server acquires lyrics data corresponding to information specifying the playback music received from the terminal device, and transmits the lyrics data to the terminal device.
  • the terminal device receives lyric data corresponding to the reproduced music from the server when it is determined that the music reproducing device can be used, and generates lyric audio data based on the lyric data.
  • the terminal device adds the lyrics voice data to the song data to generate the song data with the lyrics voice so as to precede the lyrics portion in the song, and transmits it to the music playback device determined to be usable for playback.
  • the lyrics data is provided from the server only to the terminal device operating together with the available music playback device, and can be played back together with the music data.
  • Another preferred embodiment of the present invention is a playback system including a server and a terminal device, wherein the server stores use permission information for specifying an available music playback device, and the terminal device.
  • the determination means for determining whether or not the music playback device can be used, and the acquisition of the music data of the music and the lyrics data of the music Means, lyric sound data generating means for generating lyric sound data based on the lyric data, and adding the lyric sound data to the music data so as to precede the lyric part in the music Music data generating means with lyric sound for generating music data, and playback music received from the terminal device when the determination means determines that the music playback apparatus is usable; Transmitting means for transmitting the song data with lyrics voice corresponding to the information for designating to the terminal device, the terminal device from the music playback device connected to the terminal device identification information of the music playback device ID information acquiring means for transmitting the information to the
  • the server stores use permission information for specifying an available music playback device.
  • the terminal device acquires identification information of the music playback device from the music playback device connected to the terminal device, and transmits the identification information to the server.
  • the server determines whether or not the music playback device can be used based on the music playback device identification information and use permission information received from the terminal device.
  • the server acquires the song data of the song and the lyrics data of the song, generates the lyrics voice data based on the lyrics data, and adds the lyrics voice data to the song data so as to precede the lyrics portion in the song. To generate song data with lyrics audio.
  • the terminal device receives selection of a reproduction music that is a music to be reproduced from the user, and transmits information specifying the selected reproduction music to the server.
  • the server transmits music data with lyrics audio corresponding to information specifying the playback music received from the terminal device to the terminal device.
  • the terminal device receives music data with lyrics audio corresponding to the playback music from the server, and transmits the music data to the music playback device determined to be usable for playback. In this way, the lyrics data is provided from the server only to the terminal device operating together with the available music playback device, and can be played back together with the music data.
  • the second communication means receives the identification information of the music playback device before transmitting the song data with lyrics sound to the music playback device, and the received identification information
  • the music playback device is determined again as a music playback device that is determined to be usable by the server
  • the music data with lyrics voice is transmitted to the music playback device that has received the identification information, and the received identification is received.
  • the terminal device re-determines whether or not the music playback device can be used before transmitting the song data with lyrics voice to the music playback device.
  • the storage unit stores identification information of an available music reproduction device as the use permission information, and the determination unit receives the identification received from the terminal device. When the same identification information as the information is stored in the storage unit, it is determined that the music playback device can be used.
  • the storage unit stores a predetermined use permission code as the use permission information, and the determination unit uses the identification information received from the terminal device as the use information.
  • the permission code is included, it is determined that the music playback device can be used.
  • a terminal device capable of communicating with a server acquires identification information of the playback device from a playback device connected to the terminal device, and transmits identification information to the server.
  • a first communication means for transmitting information specifying content to be reproduced to the server, and receiving content data corresponding to the content from the server when the reproduction device is determined to be usable;
  • Second communication means for transmitting data to the playback device determined to be usable.
  • the above terminal device acquires the identification information of the playback device from the playback device connected to the terminal device, and transmits it to the server. In addition, information specifying content to be reproduced is transmitted to the server. Then, when it is determined that the playback device is usable, the terminal device receives content data corresponding to the content from the server, and transmits the content data to the playback device determined to be usable. As a result, the content data is transmitted only to the playback device determined to be usable.
  • a content communication method executed by a terminal device capable of communicating with a server acquires identification information of the playback device from a playback device connected to the terminal device, and sends the identification information to the server.
  • the identification information acquisition step to be transmitted, information specifying the content to be played back is sent to the server, and content data corresponding to the content is received from the server when the playback device is determined to be usable.
  • a second communication step of transmitting the content data to the playback device determined to be usable By this method, the content data is transmitted only to the playback device determined to be usable.
  • a program executed by a terminal device capable of communicating with a server acquires identification information of the playback device from a playback device connected to the terminal device, and transmits the identification information to the server.
  • Identification information acquisition means first communication means for transmitting information specifying content to be played back to the server and receiving content data corresponding to the content from the server when it is determined that the playback device is usable
  • the terminal device is caused to function as second communication means for transmitting the content data to the playback device determined to be usable.
  • a server capable of communicating with a server and a terminal device is connected to the terminal device from the terminal device and a storage unit that stores use permission information for specifying a usable playback device.
  • the content data is transmitted only to the playback device determined to be usable.
  • Assist Vocal [1.1] Concept of Assist Vocal When a user who is driving a vehicle plays and listens to music in the car, he may want to sing the song he is listening to. However, since the lyrics information cannot be seen while driving, the user cannot sing unless the lyrics of the song are stored.
  • the lyrics contained in the song are output as an audio signal to teach the user.
  • the lyrics included in the song are output as audio before the lyrics are played in the song. Tell the user. Thereby, the user can sing the music being reproduced even during driving. Also, users other than the driver can sing songs without looking at the lyrics collection.
  • assistant vocal the function of outputting the contents of the lyrics and transmitting them to the user prior to the timing when the lyrics are reproduced in the music.
  • assistant vocal the function of outputting the contents of the lyrics and transmitting them to the user prior to the timing when the lyrics are reproduced in the music.
  • FIG. 1 shows the concept of assist vocals.
  • FIG. 1 schematically shows one piece of music.
  • the horizontal axis in FIG. 1 indicates time.
  • One piece of music includes a lyrics portion divided into a plurality of blocks.
  • the part of the lyrics included in the music to be played is called “vocal”.
  • the part other than vocals is called “interlude”. Therefore, usually one piece of music is composed of a plurality of interludes and a plurality of vocals.
  • the music is composed of three vocals 1 to 3 and a plurality of interludes. It is assumed that the content (lyrics) of the vocal 1 is “Aiueo”, the content of the vocal 2 is “Kakikukeko”, and the content of the vocal 3 is “Sashisuseso”.
  • the lyrics “Aiueo” corresponding to the vocal 1 are output as audio prior to the timing at which the vocal 1 in the music is played back.
  • the lyric sound output by the assist vocal is called “speech” and is distinguished from “vocal” included in the music.
  • speech 1 corresponding to vocal 1 is output prior to vocal 1.
  • speech 2 is output prior to vocal 2
  • speech 3 is output prior to vocal 3.
  • FIG. 2 is a flowchart of the assist vocal process. This process is executed by a terminal device mounted on the vehicle, typically a mobile terminal such as a smartphone, and the details thereof will be described later. In the following description, it is assumed that the terminal device executes processing.
  • the terminal device determines whether or not the assist vocal is on (step S1).
  • the assist vocal may be turned on / off manually by the user or automatically.
  • the assist vocal may be turned on / off manually by the user or automatically.
  • the terminal device detects this.
  • a terminal device judges a user's voice, for example using a microphone etc., and when a user is singing a song or performing the act according to a song, an assist vocal is automatically carried out. Set to on.
  • the assist vocal automatic setting method will be described later.
  • step S2 If the assist vocal is not set to ON (step S1: No), the process ends.
  • step S1: Yes the terminal device specifies the music being reproduced (step S2).
  • the music played in the car is a music stored in the terminal device by downloading from the server, a music stored in a storage medium such as a CD or a memory of the vehicle-mounted device, a radio Music that is being played from When the music stored in the terminal device is being reproduced, the terminal device can easily specify the music being reproduced.
  • the terminal device collects the music being played from the speaker in the car with a microphone.
  • the music search server stores a large number of pieces of music data as a database, specifies music that matches the audio data received from the terminal device, and indicates the music (for example, music name, artist name, etc. (Referred to as “music identification information”) to the terminal device. In this way, the terminal device acquires the music specifying information of the music currently being played.
  • FIG. 3 is a flowchart of the speech information generation process.
  • FIG. 4 shows an outline of the speech information generation process.
  • the terminal device acquires the lyrics data of the music specified in step S2 from an external server or the like (step S31).
  • the “lyric data” is information that defines what lyrics are reproduced at which timing in the music, specifically, lyrics text data indicating the lyrics included in the music, This is information in which the reproduction time data indicating the reproduction time (the elapsed time from the start time of the song) is associated with the reproduction time data.
  • the terminal device acquires music analysis data (step S32).
  • the music analysis data is information indicating musical features such as beat positions and bar positions in the music, and is generated based on the audio data of the reproduced music.
  • the terminal device has a built-in music analysis application, collects music played from a vehicle speaker with a microphone, acquires audio data, and analyzes the audio data to obtain beat positions. Acquire music analysis data such as. Note that the music analysis data may be acquired using an external music analysis device or server instead of incorporating the music analysis application in the terminal device.
  • Lyrics blocking is a process of blocking lyrics text data included in the lyrics data acquired in step S31, and one block corresponds to one speech. That is, lyric blocking is a process of dividing lyric text data into speech units.
  • the terminal device has acquired “Aiueokiki Kekoshisashisoseso” as the lyric text data, and the terminal device has obtained three blocks “Aiueo”, “Kakikukeko”, and “Sashisetsuso”. To generate block lyrics data.
  • FIG. 5 shows an example of lyrics block.
  • FIG. 5A shows a first method. In this method, the interval between the interludes included in the music is set as one block.
  • the “interlude” is a part other than “vocal” in the music. Specifically, when the length It of a section other than vocal (non-vocal section) is longer than a predetermined length t1, the terminal apparatus determines that the section is an interlude.
  • Fig. 5 (C) shows the second method.
  • the terminal device determines each block based on a break included in the lyrics data. That is, if the lyric text data included in the lyric data includes delimiter information in advance, the terminal device can block the lyric text data according to the delimiter.
  • the terminal device performs lyrics speech (step S34).
  • the block lyric data obtained by lyric blocking is text data indicating lyrics
  • lyric speech is a process of converting block lyric data into audio data.
  • the terminal device incorporates text-to-speech (TTS: TextToSpeech) software, and converts each block lyrics data obtained in step S33 into speech data.
  • TTS TextToSpeech
  • speech data 1 to 3 that are audio data are generated from each block lyrics data.
  • TTS conversion by an external server or the like may be used.
  • the terminal device changes the speech length (step S35).
  • the speech length change is a process for shortening the time length of each speech obtained by lyric speech so that it can be reproduced in a short time.
  • each speech is reproduced in an interlude preceding the corresponding vocal.
  • the speech length is changed.
  • the playback time of each speech is shortened (the playback speed is increased) within a range that can be heard by humans.
  • original speech length the time length of each speech obtained in step S34
  • ⁇ 2 the speech length conversion coefficient
  • the playback time may be further shortened according to the duration of the interlude corresponding to each speech. In this case, even for speech with the same number of characters or words with the same lyrics, the playback time varies depending on the position in the song (the length of the preceding interlude).
  • the terminal device calculates the speech insertion timing (step S36).
  • the terminal device inserts speech corresponding to a certain vocal prior to the playback timing of the vocal.
  • the speech 1 corresponding to the vocal 1 is inserted before the reproduction timing of the vocal.
  • the speech 2 corresponding to the vocal 2 is inserted before the reproduction timing of the vocal 2
  • the speech 3 corresponding to the vocal 3 is inserted before the reproduction timing of the vocal 3.
  • FIG. 6 shows an example of the timing at which the speech 2 corresponding to the vocal 2 is inserted.
  • the speech ends a certain time before the start timing of the corresponding vocal.
  • the speech 2 is inserted so as to end a certain time T2 before the reproduction start timing of the vocal 2. That is, the speech 2 ends a predetermined time T2 before the start of the reproduction of the vocal 2.
  • the reproduction start timing of the speech 2 is determined according to the length of the speech 2.
  • the speech end timing is matched with the beat position of the music.
  • the reproduction start timing of the speech 2 is determined according to the length of the speech 2.
  • the position of the beat of the music is acquired from the music analysis data described above.
  • both the speech playback start timing and playback end timing are matched with the beat position of the music. Specifically, in the example of FIG. 6, the playback start timing and playback end timing of the speech 2 are both made coincident with the third beat of the four beats.
  • both the speech end timing or the start / end timing coincide with the beat position of the music the speech is linked to the music, so that the user can easily sing the music.
  • the terminal device determines the speech insertion timing. Specifically, for each speech, the playback start timing and playback end timing are defined by the elapsed time from the beginning of the music.
  • the playback start timing and playback end timing of each speech is stored as part of the speech information. That is, the speech information includes an audio signal corresponding to each speech (hereinafter also referred to as “speech signal”) and the reproduction start timing / reproduction end timing of each speech.
  • the processing returns to the main routine shown in FIG. 2, and the terminal device acquires the current playback position of the music being played back (step S4). Specifically, the terminal device acquires the current reproduction position by counting the elapsed time from the reproduction start time of the music being reproduced.
  • the terminal device performs speech enhancement processing (step S5).
  • the speech emphasis process is a process for distinguishing vocals included in music from speech and making them easy to hear, details of which will be described later.
  • the terminal device reproduces the speech based on the reproduction start timing / reproduction end timing of each speech included in the speech information and the current reproduction position (step S6). Specifically, the speech reproduction is started at the speech reproduction start timing, and the speech reproduction is terminated at the speech reproduction end timing. As a result, the corresponding speech is reproduced prior to the vocal in the music.
  • the terminal device determines whether or not the speech reproduction should be terminated (step S7).
  • Examples of the case where speech reproduction should be terminated include a case where speech information is lost, a case where music reproduction itself is terminated, a case where assist vocals are turned off by a user operation, and the like. If the speech reproduction should not be terminated (step S7: No), the process returns to step S4 to continue the speech reproduction. On the other hand, if the speech reproduction should be terminated (step S7: Yes), the assist vocal process is terminated.
  • the terminal device collects the voice uttered by the user with a microphone, and the user is singing (singing a song) according to the music or performing an action equivalent to singing.
  • Assist vocal is automatically turned on when it is determined. For example, as a result of analyzing voice data collected by a microphone, if it is determined that a nose is sung, a piece is being sung, a humming, or the like, the assist vocal is turned on. On the other hand, when the voice data is not singing but is a conversation with a passenger, the assist vocal is not turned on. The assist vocal is not turned on even when the voice data includes a part singing a nose song, or when the voice data is mostly conversational.
  • whether or not the user's voice included in the voice data is a song can be determined based on the presence or absence of a rhythm or pitch included in the voice data. For example, if the rhythm is regular or the change in pitch is large, it is judged as singing. If the rhythm is irregular, it is judged as not singing (conversation) if the change in pitch is small. be able to. Further, by using the music analysis application described above, it may be determined that the song is a song when a beat or measure can be extracted from the audio data, and may not be a song when the song cannot be extracted. Further, by using the music search server or the music search function described above, it may be determined that the song is a song when the song can be specified from the audio data, and may not be determined when the song cannot be specified.
  • the terminal device calculates the correlation between the collected audio data and the music being played, and when there is a correlation greater than a certain value, determines that the user is singing and turns on the assist vocal Also good.
  • the terminal device has already acquired the lyrics data of the song being played back, the user is singing when the correlation between the voice data collected by the microphone and the lyrics data is a certain value or more You may judge.
  • the lyric data when the user's voice is output even at the interlude position of the music where the lyric should not exist, it may be determined that it is a conversation.
  • the rhythm information collected by the microphone may be used. For example, if it is determined that the user is hitting the steering wheel with his / her hand or finger in accordance with the rhythm of the music, or if he / she is stepping on the floor with his / her foot, he / she performs an act similar to singing.
  • the assist vocal may be turned on. In this case, the correlation between the rhythm collected by the microphone and the rhythm of the music being played back may be calculated, and the assist vocal may be turned on when the correlation is a certain value or more. In addition, the assist vocal may be turned on when the rhythm collected by the microphone repeats a certain rhythm without calculating the correlation with the rhythm of the music being played. .
  • the assist vocal may be turned on when the state of the user is photographed with a camera that photographs the inside of the vehicle and the user is shaking his / her head along with the music.
  • the assist vocal is turned on when it is determined that the user is singing.
  • the user knows the lyrics and plays the assist vocal. If it is determined that it is not necessary to turn on the assist vocal, it is not necessary to turn on the assist vocal.
  • the correlation between the collected sound data and the music being played is a certain value or more and the correlation with the lyrics data is a certain value or more, the user knows the lyrics. Assist vocals are not turned on even when singing.
  • the speech information may be generated and prepared for output. After that, if the correlation between the collected audio data and the music being played is less than a certain value, or if the correlation with the lyrics data is less than a certain value, it is determined that the user does not know the lyrics. , Output assist vocals.
  • the assist vocal auto-on setting method has been described, but the assist vocal auto-off setting can also be performed.
  • the assist vocal While the assist vocal is turned on, the user does not sing along with the song (does not sing a song) or acts similar to the song (singing a nose song, singing a piece of music, humming Assist Vocal may be automatically turned off when it is determined that the user has not performed the operation).
  • the assist vocal may be automatically turned off, and if it is determined that the rhythm is not taken or the user is not shaking his / her head to the music, Vocals may be turned off automatically.
  • the assist vocal is automatically turned on or off based on whether or not the user is singing or acting in accordance with the singing.
  • automatic on setting or automatic off setting may be performed.
  • the assist vocal when playing the chorus part of the song, the assist vocal is automatically turned on, and when playing the part other than the chorus part of the song, The assist vocal may be automatically turned off.
  • the assist vocal when playing the part other than the rust part of the song, the assist vocal is automatically turned on, When playing the part, the assist vocal may be automatically turned off.
  • the speech emphasis process is a method in which the user distinguishes between speech and vocal and makes it easy to hear, and shows the following several methods.
  • step S35 Processing when speech and vocal overlap
  • the speech is basically reproduced during the interlude immediately before the corresponding vocal, and preferably does not overlap with the vocal in time.
  • the above-described speech length changing process (step S35) is performed.
  • the speech may not be completely reproduced during the interlude even if the speech length is shortened. That is, when the length of the speech is longer than the length of the interlude, the speech and vocal are partially overlapped and reproduced.
  • any of the following processing may be performed instead of reproducing speech and vocals in an overlapping manner.
  • FIG. 7A shows a case where the rear portion of the speech and the head portion of the vocal overlap and an overlapping portion X occurs.
  • the volume of the vocal is adjusted in the overlapping portion X.
  • the vocal volume is reduced to a level where speech can be heard, or zero.
  • the reproduction of the speech is prioritized and the speech is easy to hear.
  • FIG. 7 (B) shows a case where the speech head portion and the rear portion of the previous vocal overlap, resulting in an overlap portion X.
  • the vocal volume is adjusted in the overlapping portion X. Specifically, the vocal volume is reduced to a level where speech can be heard, or zero. Further, in the overlapping portion X, the volume level of the vocal may not be suddenly lowered, but the volume level may be gradually lowered by fading out the vocal. Thereby, in the overlapping part X, the reproduction of the speech is prioritized and the speech is easy to hear.
  • the above level adjustment may be performed by lowering the volume level of the vocal component when the vocal component and the performance component such as a musical instrument are separated in the music signal.
  • the volume level of the entire music signal may be reduced, In particular, the volume level may be lowered only for the component in the frequency band corresponding to vocal (human voice).
  • FIG. 7C shows a case where the rear part of the speech and the head part of the vocal overlap and an overlapping part X occurs.
  • the speech volume is adjusted in the overlapping portion X. Specifically, the volume of speech is reduced or zero. Instead of suddenly reducing the speech volume, the speech may be faded out to gradually decrease the volume.
  • speech cannot be heard at the overlapping portion X, but generally when listening to a song that the user knows to some extent, the entire lyrics are not remembered, but if the beginning of the lyrics is known, Often, you can sing with the memory of the lyrics. Therefore, as shown in FIG. 7C, if the head portion of the speech can be heard, the rear portion of the speech may be difficult to hear. This technique is effective in such a case.
  • FIG. 8A shows a configuration in which the phase of speech output from left and right speakers is inverted.
  • the music signal of the left (L) channel is supplied to the adder 32, and the music signal of the right (R) channel is supplied to the adder 33.
  • the speech signal is supplied to the adder 33 as it is, and its phase is inverted by the phase inverter 31 and supplied to the adder 32.
  • the output of the adder 32 is supplied to the left speaker 30L, and the output of the adder 33 is supplied to the right speaker 30R.
  • the sound image of a song including vocals is localized between the left and right speakers, whereas the sound image of a speech is localized around the user's ears, and the user distinguishes between the speech and the vocals in the song. It becomes easy.
  • the phase of the speech signal supplied to the left speaker 30L is inverted by the phase inverter 31, but only the phase of the speech signal supplied to the right speaker 30R is inverted. It may be reversed.
  • the speech signal supplied to one speaker is not necessarily There is no need to reverse (change 180 °). That is, it is only necessary to give a certain phase difference between the speech signal supplied to one speaker and the speech signal supplied to the other speaker.
  • phase inverter 31 is an example of the signal processing means of the present invention
  • adders 32 and 33 are examples of the adding means and the output means of the present invention.
  • FIG. 8B shows a configuration in which a sound image of speech can be set at an arbitrary position.
  • the music signal of the left (L) channel is supplied to the adder 32, and the music signal of the right (R) channel is supplied to the adder 33.
  • the speech signal is supplied to the adders 32 and 33 via the sound image localization control calculation unit 34 and the crosstalk cancellation unit 35.
  • the sound image localization control calculation unit 34 convolves the transfer function between the target speaker position and the listening position (user's position) with the speech signal, and the crosstalk canceling unit 35 sets the speaker outputting the music and the listening position. A process for canceling the transfer function between them is performed. Accordingly, the sound image of the music can be localized between the left and right speakers 30L and 30R, and the sound image of the speech can be localized at the target speaker position, so that the user can easily distinguish between the speech and the vocal.
  • the sound image localization control calculation unit 34 and the crosstalk cancellation unit 35 are examples of signal processing means of the present invention, and the adders 32 and 33 are examples of addition means and output means of the present invention.
  • the music signals of the left and right channels are supplied to the vehicle speakers 30L and 30R, respectively.
  • the speech signal is supplied to the right headrest speaker 35R as it is, and the phase is inverted by the phase inverter 31 and is supplied to the left headrest speaker 35L.
  • the phase difference is given to the speech signals supplied to the two headrest speakers 35L and 35R, the sound image of the speech is localized at a position different from the sound image of the music, and the user can recognize the speech and the vocals in the music. It becomes easy to distinguish.
  • a constant phase difference is given between the speech signal supplied to one headrest speaker and the speech signal supplied to the other headrest speaker. Do it.
  • the speech may be reproduced using the headrest speaker in the passenger seat instead of the headrest speaker in the driver seat. Further, when headrest speakers are mounted on a plurality of seats of the vehicle, it may be possible to select and set the necessity of speech reproduction for each seat. In this way, it is possible to set so that the speech is reproduced only from the headrest speaker in the seat of the passenger who wants to sing the music while listening to the speech.
  • the sound image of the speech can be placed at an arbitrary position by using the sound image localization control calculation unit 34 and the crosstalk cancellation unit 35 in the same manner as the processing described in FIG. 8B. May be localized. This makes it easy for the user to distinguish between speech and vocals.
  • FIG. 10 shows the overall configuration of the music playback system according to the first embodiment.
  • a plurality of vehicles 1, a content provider 2, and a gate server 3 can communicate with each other via a network 4.
  • the plurality of vehicles 1 can communicate with the content server 2 and the gate server 3 via the network 4 by wireless communication.
  • Content provider 2 is a server such as a music distributor and provides music data, music metadata, lyrics data, and the like.
  • the gate server 3 is a server that functions to realize the assist vocal according to the present embodiment, acquires music data, metadata, lyrics data, and the like of necessary music from the content provider 2 and stores them in a database (not shown). ing.
  • the vehicle 1 includes a terminal device 10, a music playback device 20, and a speaker 30.
  • the terminal device 10 is typically a mobile terminal such as a smartphone, and includes a communication unit 11, a control unit 12, a storage unit 13, a microphone 14, and an operation unit 15.
  • the communication unit 11 communicates with the gate server 3 through the network 4.
  • the control unit 12 includes a CPU and the like, and controls the entire terminal device 10.
  • the storage unit 13 is a memory such as a ROM or a RAM, and stores a program for the control unit 12 to execute various processes, and also functions as a work memory.
  • the control unit 12 executes the program stored in the storage unit 13, processing including assist vocal processing is executed.
  • storage part 13 may memorize
  • the microphone 14 collects sounds such as music being played in the car, singing by the user, conversation, etc., and generates sound data.
  • the operation unit 15 is typically a touch panel or the like, and receives an operation and selection input by a user.
  • the music playback device 20 is a car audio, for example, and includes an amplifier.
  • the speaker 30 is a speaker mounted on the vehicle.
  • the music playback device 20 plays back music from the speaker 30 based on the music data supplied from the terminal device 10.
  • the vehicle 1 includes a terminal device 10x.
  • the terminal device 10x is a device having the functions of the terminal device 10 such as a portable terminal shown in FIG. 11A and the music playback device 20 such as car audio.
  • the terminal device 10 x includes a communication unit 11, a control unit 12, a storage unit 13, a microphone 14, and an operation unit 15, and a music playback unit 16 that corresponds to the music playback device 20.
  • the terminal device 10x is connected to the speaker 30 and reproduces music from the speaker 30 based on music data.
  • FIG. 12 is a flowchart of the assist vocal process according to the first embodiment.
  • the assist vocal process is executed mainly by the terminal device 10 or 10x (hereinafter simply referred to as “terminal device 10”).
  • the gate server 3 is connected to the content provider 2 via the network 4, acquires music data and lyrics data for a plurality of music, and stores them in an internal database (step S 101).
  • the terminal device 10 receives designation of the music to be played by the operation of the operation unit 15 by the user (step S102), and transmits music designation information for designating the music to the gate server 3 (step S103).
  • the gate server 3 acquires the song data and lyrics data of the song corresponding to the received song designation information from the database, and transmits it to the terminal device 10 (step S104).
  • the terminal device 10 performs the processing of steps S105 to S109 using the received music data and lyrics data.
  • the processing in steps S105 to S109 is the same as that in steps S3 to S7 in FIG.
  • the terminal device 10 mounted on the vehicle 1 mainly executes the assist vocal process.
  • the gate server 3 acquires the music data from the content provider in step S101. However, when the music data is stored in the terminal device 10, the gate server 3 receives the music data from the terminal device 10. You may get it. Further, when music data is stored in the database in the gate server 3, the music data may be acquired therefrom.
  • Second Embodiment In the second embodiment, a part of the assist vocal process is executed on the gate server 3 side.
  • the overall configuration of the music playback system according to the second embodiment is the same as that of the first embodiment shown in FIG.
  • FIG. 13 is a flowchart of the assist vocal process according to the second embodiment.
  • the gate server 3 generates speech information, further generates music data with speech, and transmits it to the terminal device 10.
  • the terminal device 10 receives and reproduces the music data with speech. This will be described in detail below.
  • the gate server 3 is connected to the content provider 2 via the network 4, acquires music data and lyrics data for a plurality of music, and stores them in an internal database (step S 201). And the gate server 3 produces
  • the gate server 3 adds the speech to the music data and generates the music data with speech (step S203). Specifically, the gate server 3 combines the speech signal corresponding to each speech with the music data at the timing calculated by the process of step S36 in FIG. 3 based on the generated speech information, and generates music data with speech. And store it in the database.
  • the music data with speech is data in which speech is reproduced in addition to the music by reproducing as it is.
  • the terminal device 10 receives designation of the music to be played by the operation of the operation unit 15 by the user (step S204), and transmits music designation information for designating the music to the gate server 3 (step S205).
  • the gate server 3 transmits the song-attached music data corresponding to the received music designation information to the terminal device 10 (step S206).
  • the terminal device 10 reproduces the received music data with speech (step S207). Thereby, the speech is reproduced at an appropriate timing during the reproduction of the music.
  • the terminal device 10 determines whether or not the music reproduction should be terminated (step S208). When the music has been played to the end, or when playback should be terminated, such as when the user has stopped playing (step S208: Yes), the terminal device 10 finishes playing. On the other hand, if the reproduction of the music should not be terminated (step S208: No), the process returns to step S207, and the reproduction of the music data with speech is continued.
  • the music data with speech is generated on the gate server 3 side and provided to the terminal device 10.
  • the terminal device 10 can listen to music including speech by reproducing the received music data with speech.
  • the gate server 3 acquires the music data from the content provider in step S201. However, if the music data is stored in the terminal device 10, the gate server 3 receives the music data from the terminal device 10. You may get it. Further, when music data is stored in the database in the gate server 3, the music data may be acquired therefrom.
  • the music that is being reproduced by the terminal device 10 is reproduced with speech added.
  • speech can be added to music reproduced from a source other than the terminal device 10, such as a radio in a car, a CD (hereinafter referred to as “external source”).
  • the terminal device 10 basically generates the speech information by the above-described method, and only needs to reproduce the speech at a timing corresponding to the reproduction position of the music reproduced from the external source.
  • FIG. 14 shows a flowchart of assist vocal processing in this case.
  • the terminal device 10 collects music reproduced from an external source by the microphone 14 to acquire reproduced music data (step S151), and transmits this to the gate server 3 (step S152).
  • the gate server 153 receives the reproduction music data from the terminal device 10, and specifies the corresponding music and its reproduction position (step S153).
  • the gate server 3 includes a music search unit having the function of the music search server described above, specifies the music based on the reproduced music data, and reproduces the reproduction position corresponding to the reproduced music data portion. Is identified. Then, the gate server 3 transmits the lyrics data and the reproduction position information to the terminal device 10 together with the music name and artist name of the specified music (Step S154).
  • the terminal device 10 generates speech information using the received lyric data (step S155). Note that the speech information is generated by the same method as described with reference to FIG. In addition, the terminal device 10 can acquire music analysis data by analyzing the reproduction music data acquired with the microphone 14 (process of step S32 of FIG. 3).
  • the terminal device 10 calculates the current playback position of the music based on the playback position information acquired from the gate server 3 (step S156). This method will be described later.
  • the terminal device 10 performs speech enhancement processing (step S157), and reproduces speech at an appropriate timing according to the music being reproduced by the external source (step S158). As a result, the speech is reproduced in accordance with the music being reproduced from the external source.
  • the terminal device 10 determines whether or not to end the speech reproduction (step S159), and if not to end, returns to step S156 and continues the process.
  • step S159 determines whether or not to end the speech reproduction.
  • the reproduced music data transmitted from the terminal device 10 to the gate server 3 is actually data of a plurality of audio frames. That is, the terminal device 10 collects the music reproduced by the external source with the microphone 14 and sequentially transmits it to the gate server 3 as a plurality of audio frames.
  • the terminal device 10 has audio frames n, (n + 1), (n + 2),. . . Are sequentially transmitted to the gate server 3 as reproduced music data. At this time, the terminal device 10 stores the time when the reproduced music data is first transmitted, and the time when the audio frame n is transmitted in the example of FIG. 15 (hereinafter referred to as “reference time t0”).
  • the music search unit of the gate server 3 refers to information on a large number of music pieces stored in the database, and specifies music pieces based on the received plurality of audio frames.
  • the music search unit of the gate server 3 can identify the music based on the audio frames n to (n + 4).
  • the gate server 3 uses the playback time information (tn) from the beginning of the audio frame n received from the terminal device 10 as the playback position information as the playback position information in addition to the music title, artist name, etc. Transmit to device 10. That is, the reproduction position information transmitted from the gate server 3 to the terminal device 10 in step S154 of FIG.
  • step S156 the terminal apparatus 10 calculates the elapsed time ⁇ t of the elapsed time from the reference time t0 stored in advance to the present, and adds this to the reproduction time tn. That is, the reproduction time tn transmitted from the gate server 3 is the time from the beginning of the music to the audio frame n, and the elapsed time ⁇ t is the time from the audio frame n to the present. Therefore, the current playback position (playback time) Tc is calculated by the following equation.
  • Tc tn + ⁇ t (2)
  • step S159 the reproduction may be ended when one music is finished. However, when another music is reproduced after one music is finished, the process is continued. Also good. That is, the speech reproduction may be continued while the transmission of the music reproduction data from the terminal device 10 to the gate server 3 is continued. Thereby, even if the music reproduced from the external source changes, it becomes possible to continue the speech reproduction following the song.
  • a music playback device 20 such as a car audio mounted on the vehicle 1.
  • the music playback device 20 used for assist vocals is unlimited, there will be a problem in the sound quality and copyright management of the played music. there is a possibility. Therefore, the above-mentioned problem is solved by providing a restriction on the music playback device 20 that can be used when performing assist vocals. Specifically, the assist vocal can be executed only when a product produced by a specific producer is used as the music playback device 20.
  • FIG. 16 (A) schematically shows a method of restricting the use of unsold products, that is, products that are newly sold in the market.
  • a producer who produces a product that can execute assist vocals (hereinafter also referred to as “usable product”) assigns a device ID to each product when the product is produced in a production factory.
  • This device ID can be a serial number of a product, for example, and is stored in the internal memory 20x of the music playback device 20 before shipment from the factory.
  • the production factory notifies the gate server 3 of the device ID assigned to each shipped product, and the gate server 3 stores the device ID in the internal storage unit 3x. Thereby, the device ID of the use permitted product is stored in the storage unit 3x of the gate server 3 as the use permission information.
  • the user who purchased the product of the music playback device 20 attaches it to the vehicle. Thereby, as shown in FIG. 16A, the music playback device 20 can communicate with the terminal device 10. Further, the device ID of the product is stored in the memory 20x of the music playback device 20 as described above.
  • FIG. 17 shows a flowchart of the availability check process. This availability check process is executed between the gate server 3 and the terminal device 10 in the environment shown in FIG.
  • the terminal device 10 communicates with the music playback device 20 to obtain a device ID (step S301) and transmits it to the gate server 3 (step S302).
  • the gate server 3 determines whether the music playback device 20 is a use-permitted product (step S303).
  • the device ID of the permitted product is stored in the storage unit 3x of the gate server 3. Therefore, the gate server 3 determines whether or not the received device ID is stored in the storage unit 3x.
  • the gate server 3 determines that the music playback device 20 is a use-permitted product, and uses the music playback device 20 when it is not stored. It is determined that the product is not a permitted product. Then, the gate server 3 transmits the determination result to the terminal device 10 (step S304).
  • the terminal device 10 receives the determination result and notifies the user by displaying it on the display unit (step S305). Thus, the availability check process ends.
  • FIG. 16B schematically shows a method of restricting the use of a music playback device 20 that has already been sold.
  • the device ID since the device ID is not notified to the gate server 3 from the production factory or the like, the device ID of the permitted product is not stored in the gate server 3.
  • a device ID such as a serial number is usually given to a sold product, and the device ID often includes a code unique to the producer or the like. Therefore, authentication is performed using this unique code as use permission information.
  • the storage unit 3x of the gate server 3 stores the unique code “PEC” as a use permission code.
  • the gate server 3 determines whether or not the usage permission code “PEC” is included in the received device ID. If the usage permission code “PEC” is included, the music playback device 20 is determined to be a usage-permitted product. If the check code “PEC” is not included, the music playback device 20 is determined. It is determined that the product is not a licensed product. In this way, it is possible to restrict the use of the music playback apparatus 20 that has already been sold.
  • step S303 the gate server 3 determines whether or not the product is a use permitted product depending on whether or not the use permission code “PEC” is included in the device ID received from the music playback device 20. Determine.
  • the availability check process can be performed at the first communication between the terminal device 10 and the gate server 3 after the music playback device 20 is first mounted on the vehicle 1. It can also be performed at the first execution of the assist vocal using the music playback device 20. That is, when an assist vocal is first requested from the terminal device 10, the gate server 3 requests the terminal device 10 to transmit a device ID and performs a usability check process.
  • the availability check process may be performed every time the user executes the assist vocal.
  • the gate server 3 requests the device ID of the music playback device 20 from the terminal device 10 and determines whether or not it can be used.
  • the gate server 3 continues the assist vocal process thereafter only when it is determined that the music playback device 20 is a use permitted product.
  • the terminal device 10 executes assist vocal by the method shown in FIG. 13 or FIG. Specifically, the terminal device 10 transmits the song data with lyrics voice generated by the gate server 3 or the terminal device 10 to the music playback device 20, and plays back the song data with lyrics voice received by the music playback device 20. .
  • the terminal device 10 receives the identification information of the music playback device 20 again before transmitting the music data with lyrics audio to the music playback device 20, and the music playback device 20 that is about to transmit the music data with lyrics audio transmits the music data. It may be determined again whether or not the product is determined to be a use-permitted product by the gate server 3. As a result of the re-determination, when it is determined that the music playback device 20 that is going to transmit the song data with lyrics voice is a use permitted product, the terminal device 10 sends the song data with lyrics voice to the music playback device 20. To do.
  • the terminal device 10 sends the song data with lyrics voice to the music playback device 20. Do not send to.
  • the gate server 3 determines that the music playback device 20 is not a use-permitted product, the gate server 3 notifies the terminal device 10 to that effect.
  • the assist vocal allows the user to sing a song while driving, but the joy of singing tends to be insufficient when there is only one driver. Therefore, the singing voice data of a plurality of users is collected and stored in the gate server 3, and when a certain user performs assist vocals, the singing voice data of other users are simultaneously downloaded from the gate server 3 and reproduced on the vehicle. Thereby, even if the number of users (drivers) is one, pseudo chorus can be realized.
  • the singing voice data of the user is generated by subtracting the sound when the user is not singing from the sound when the user sings according to music in the same vehicle.
  • FIG. 18A schematically shows an environment for recording a sound when the user is singing.
  • the music reproducing device 20 reproduces music from the speaker 30 into the passenger compartment based on the source sound source, and the user U sings along with the reproduced music.
  • the sound at that time is collected by the microphone M arranged in the vehicle interior.
  • the recording data generated by the microphone M includes the singing voice of the user in addition to the sound of the music (hereinafter referred to as “recording data with singing voice”).
  • This recorded data includes the acoustic characteristics CH in the passenger compartment.
  • the microphone 14 of the terminal device may be used as the microphone M.
  • FIG. 18 (B) schematically shows an environment for recording a sound when the user is not singing.
  • the music reproducing device 20 reproduces music from the speaker 30 into the passenger compartment based on the source sound source, and the reproduced sound is collected by a microphone M arranged in the passenger compartment.
  • the recording data generated by the microphone M includes the sound of the music, but does not include the user's singing voice (hereinafter referred to as “recording data without a singing voice”). This recorded data also includes the acoustic characteristic CH in the passenger compartment.
  • the terminal device 10 uses the recording data thus obtained, the terminal device 10 generates singing voice data by subtracting the recording data without singing voice from the recording data with singing voice as shown in FIG.
  • the difference of the recording data when a user sings and the recording data when not singing can be produced
  • Second Method similarly to the first method, recording data with a singing voice is generated as shown in FIG.
  • the recording data without singing voice is not recorded, but instead, the data without singing voice is generated from the sound characteristics of the source sound source and the passenger compartment.
  • the acoustic characteristic in the vehicle interior specifically refers to an impulse response measured in advance in the vehicle interior.
  • the recording data without singing voice is obtained by recording the music obtained by reproducing the source sound source under the acoustic characteristics in the vehicle interior, the sound in the vehicle interior is included in the source sound source. By convolving the characteristics, it is possible to generate singing-free data equivalent to recording data without singing. Then, as shown in FIG. 19B, the singing voice data of the user can be generated by subtracting the singing voiceless data thus generated from the recording data with singing voice.
  • the third method is basically the same as the second method, by generating the singing-free data by convolving the acoustic characteristics of the vehicle interior with the source sound source, and using this from the recorded data with singing voice Subtract and generate singing voice data.
  • the second method is based on the premise that the acoustic characteristics in the passenger compartment do not change, the acoustic characteristics in the passenger compartment actually change according to time and circumstances. Therefore, in the third method, the change in the acoustic characteristics in the passenger compartment is corrected by adaptive signal processing.
  • FIG. 19C is a block diagram of a configuration for generating singing voice data by the third method.
  • the recorded data with singing voice is corrected by the filter 61 and input to the adder 62.
  • data without singing voice generated by convolving the acoustic characteristics of the vehicle interior with the source sound source is input to the adder 62.
  • the adder 62 subtracts the data without singing voice from the filtered recording data with the singing voice and outputs it as singing voice data.
  • the singing voice data is also supplied to the adaptive signal processing unit 63.
  • the adaptive signal processing unit 63 calculates the characteristic (coefficient W) to be set in the filter 61 so as to remove the error included in the singing voice data, that is, the variation due to the change in the acoustic characteristic in the vehicle interior, and supplies the calculated characteristic To do.
  • the adaptive signal processing unit 63 calculates the coefficient W of the filter 61 so that the singing voice data in the period in which the singing voice is not included or the frequency component in which the singing voice is not included becomes zero.
  • the change in the acoustic characteristics in the passenger compartment is canceled by the filter 61.
  • the singing voice data may be generated on the gate server 3 side or generated on the terminal device 10 side. In the following description, for the convenience of explanation, it is assumed that singing voice data is generated by the first method.
  • FIG. 20 is a flowchart when the gate server 3 generates singing voice data.
  • the terminal device 10 generates recording data with a singing voice as shown in FIG. 18A (step S401), and then generates recording data without a singing voice as shown in FIG. 18B (step S402). ).
  • the terminal device 10 transmits the recording data with singing voice and the recording data without singing voice to the gate server 3 (step S403).
  • the terminal device 10 adds music information such as a music code corresponding to the recorded data and transmits the data.
  • the gate server 3 receives the recording data with singing voice and the recording data without singing voice, generates singing voice data by the calculation shown in FIG. 19A, and stores it in the internal database in association with the music based on the music information ( Step S404).
  • the singing voice data of a plurality of users is stored in the gate server 3 for each music piece.
  • FIG. 21 is a flowchart when the terminal device 10 generates singing voice data.
  • the terminal device 10 generates recording data with a singing voice as shown in FIG. 18A (step S411), and then generates recording data without a singing voice as shown in FIG. 18B (step S412). ).
  • the terminal device 10 produces
  • the terminal device 10 adds music information such as a music code corresponding to the singing voice data and transmits it.
  • the gate server 3 receives the singing voice data and the music information, and stores the singing voice data in the internal database in association with the music based on the music information (step S415).
  • the singing voice data of a plurality of users is stored in the gate server 3 for each music piece.
  • FIG. 22 is a flowchart of the choral process when the speech information generation process is performed by the terminal device 10 side.
  • the terminal device 10 mainly generates data necessary for the choral process.
  • the gate server 3 is connected to the content provider 2 via the network 4, acquires music data and lyrics data for a plurality of music, and stores them in an internal database (step S 501).
  • the terminal device 10 receives the designation of the music to be played by the operation of the operation unit 15 by the user (step S502), and further receives the designation to use the choral function (step S503). Next, the terminal device 10 transmits music designation information (including designation of the choral function) for designating the music to the gate server 3 (step S504).
  • the gate server 3 acquires the song data, lyrics data, and singing voice data of the song corresponding to the received song designation information from the database, and transmits it to the terminal device 10 (step S505).
  • the terminal device 10 generates speech information by using the received music data and lyrics data (step S506). And the terminal device 10 reproduces
  • step S508 the terminal device 10 determines whether or not the reproduction of the music has ended. If the reproduction of the music has not ended, the process returns to step S507, and if the reproduction of the music has ended, the process ends.
  • the gate server 3 acquires the music data from the content provider in step S501. However, when the music data is stored in the terminal device 10, the gate server 3 receives the music data from the terminal device 10. Data may be acquired. Further, when music data is stored in the database in the gate server 3, the music data may be acquired therefrom.
  • FIG. 23 is a flowchart of the choral process when the speech information generation process is performed by the gate server 3 side.
  • the gate server 3 mainly generates data necessary for the choral process.
  • the gate server 3 is connected to the content provider 2 via the network 4, acquires music data and lyrics data for a plurality of music, and stores them in an internal database (step S511). And the gate server 3 produces
  • the gate server 3 adds the speech to the music data and generates music data with speech (step S513). Specifically, based on the generated speech information, the gate server 3 synthesizes a speech signal corresponding to each speech with music data at an appropriate timing, generates music data with speech, and stores it in the database.
  • the terminal device 10 receives the designation of the music to be played by the operation of the operation unit 15 by the user (step S514), and further receives the designation to use the choral function (step S515). Next, the terminal device 10 transmits music designation information (including designation of the choral function) for designating the music to the gate server 3 (step S516).
  • the gate server 3 reads the music data with speech and singing voice data of the music corresponding to the received music designation information from the database, synthesizes them, and generates music data with singing voice and speech (step S517). (Step S518).
  • the terminal device 10 reproduces the received song data with speech / speech (step S519). As a result, the speech is reproduced and the singing voice of another user is reproduced at an appropriate timing during the reproduction of the music.
  • step S520 the terminal device 10 determines whether or not the reproduction of the music has ended. If the reproduction of the music has not ended, the process returns to step S519, and if the reproduction of the music has ended, the process ends.
  • the gate server 3 acquires the music data from the content provider in step S511. However, if the music data is stored in the terminal device 10, the gate server 3 receives the music data from the terminal device 10. Data may be acquired. Further, when music data is stored in the database in the gate server 3, the music data may be acquired therefrom.
  • Modification 1 The singing voice to be reproduced when performing the choral function is not limited to one singing voice.
  • a user who wants to execute the choral function may be allowed to specify the number of choruses together.
  • the gate server 3 should just perform a chorus function using the singing voice data for the designated number of people.
  • the gate server 3 stores the singing voice data in the database for each piece of music.
  • the gate server 3 associates the attribute information of the user who generated the singing voice data, for example, sex, age, and the like. You may remember.
  • the user may add his / her attribute information and transmit it. These pieces of information may be input by the user, but the information stored in the mobile terminal 10 may be automatically read by the terminal device 10 and transmitted to the gate server 3.
  • the user who intends to perform the choral function using the singing voice data stored in the gate server 3 can execute the choral function by specifying the gender, age, etc. of the singing voice to be reproduced at the same time. .
  • the singing voice data is basically generated for one entire piece of music, but may be generated for a part of one piece of music. For example, you may produce
  • the singing voice data is stored in the database of the gate server 3 together with information indicating such a part (for example, No. 1 of the song) or the reproduction time of the song for each song.
  • the user who performs the choral function can perform the choral function by using a plurality of singing voice data of only a part of the music by designating information indicating such a part. For example, it is possible to enjoy chorus with different users by using singing voice data of different users for the first and second songs.
  • the present invention can be used for an apparatus for playing music.

Abstract

According to the present invention, a server stores usage permission information indicating a usable music playback device. When a terminal device acquires identification information for a music playback device from said music playback device and transmits said information to the server, the server determines, on the basis of the identification information for the music playback device received from the terminal device and of the usage permission information, whether or not the music playback device is usable. When the music playback device is determined to be usable, the server acquires song lyric data for a musical piece and transmits said data to the terminal device. In this way, song lyric data is provided from the server only to a terminal device that operates together with a usable music playback device, and said data can be played back together with musical piece data.

Description

通信システム、再生システム、端末装置、サーバ、コンテンツ通信方法及びプログラムCOMMUNICATION SYSTEM, REPRODUCTION SYSTEM, TERMINAL DEVICE, SERVER, CONTENT COMMUNICATION METHOD, AND PROGRAM
 本発明は、楽曲の再生に伴って歌詞の情報を出力する手法に関する。 The present invention relates to a method of outputting lyrics information as music is played.
 カラオケの演奏曲に先行して歌詞データを音声合成して出力するカラオケ装置が知られている(例えば、特許文献1、2)。 A karaoke apparatus that synthesizes and outputs lyrics data prior to a karaoke performance is known (for example, Patent Documents 1 and 2).
特開4-67467号公報Japanese Patent Laid-Open No. 4-67467 特開10-63274号公報Japanese Patent Laid-Open No. 10-63274
 カラオケ装置の場合、再生される楽曲に歌詞が含まれないため、先行技術により出力される歌詞音声が聞き取りにくくなることはない。しかし、カラオケではなく通常の音楽を再生して聞いているような場合には、先行技術の手法により歌詞を音声出力すると、出力された歌詞音声が元の音楽に含まれる歌詞の部分と重なって聞き取りにくくなってしまうことがある。また、例えば車両の運転中に音楽を聞いている場合には、先行技術の手法により出力される歌詞音声が車載用ナビゲーション装置による道案内の音声メッセージなどと重なって聞き取りにくくなってしまうこともある。 In the case of a karaoke device, lyrics are not included in the music to be played back, so that the lyric sound output by the prior art does not become difficult to hear. However, if you are listening to normal music instead of karaoke, if you output the lyrics by voice using the technique of the prior art, the output lyrics will overlap with the lyrics included in the original music. It may be difficult to hear. Also, for example, when listening to music while driving a vehicle, the lyrics voice output by the prior art method may overlap with the voice message of the route guidance by the in-vehicle navigation device, and it may be difficult to hear. .
 本発明の解決しようとする課題としては、上記のものが一例として挙げられる。本発明は、歌詞を含む音楽を再生している際に、ユーザがその曲を歌うための歌詞音声を聞き取り易く提供することを目的とする。 The above is one example of problems to be solved by the present invention. An object of the present invention is to provide an easy-to-listen lyric sound for a user to sing a song while playing music including lyrics.
 請求項1に記載の発明は、サーバと端末装置とを備える通信システムであって、前記サーバは、利用可能な再生装置を特定する利用許可情報を記憶する記憶部と、前記端末装置から受信した再生装置の識別情報と前記利用許可情報とに基づいて、前記再生装置が利用可能か否かを判定する判定手段と、前記判定手段により前記再生装置が利用可能と判定されたときに、前記端末装置から受信したコンテンツを指定する情報に対応するコンテンツデータを前記端末装置へ送信する送信手段と、を備え、前記端末装置は、当該端末装置と接続された再生装置から当該再生装置の識別情報を取得し、前記サーバへ送信する識別情報取得手段と、再生すべきコンテンツを指定する情報を前記サーバに送信し、前記再生装置が利用可能と判定されたときに当該コンテンツに対応するコンテンツデータを前記サーバから受信する第1の通信手段と、前記コンテンツデータを利用可能と判定された前記再生装置に送信する第2の通信手段と、を備えることを特徴とする。 The invention according to claim 1 is a communication system including a server and a terminal device, wherein the server receives from the terminal device a storage unit that stores use permission information that identifies a usable playback device. A determination unit configured to determine whether or not the reproduction device is usable based on the identification information of the reproduction device and the use permission information; and when the determination unit determines that the reproduction device is usable, the terminal Transmitting means for transmitting content data corresponding to information specifying content received from the device to the terminal device, wherein the terminal device obtains identification information of the playback device from the playback device connected to the terminal device. The identification information acquisition means for acquiring and transmitting to the server, the information for specifying the content to be reproduced is transmitted to the server, and the reproduction apparatus is determined to be usable. A first communication unit that receives content data corresponding to the content from the server, and a second communication unit that transmits the content data to the playback device determined to be usable. To do.
 請求項2に記載の発明は、サーバと端末装置とを備える再生システムであって、前記サーバは、利用可能な音楽再生装置を示す利用許可情報を記憶する記憶部と、前記端末装置から受信した音楽再生装置の識別情報と前記利用許可情報とに基づいて、前記音楽再生装置が利用可能か否かを判定する判定手段と、楽曲の歌詞データを取得する取得手段と、前記判定手段により前記音楽再生装置が利用可能と判定されたときに、前記端末装置から受信した再生楽曲を指定する情報に対応する前記歌詞データを前記端末装置へ送信する送信手段と、を備え、前記端末装置は、当該端末装置と接続された音楽再生装置から当該音楽再生装置の識別情報を取得し、前記サーバへ送信する識別情報取得手段と、再生すべき楽曲である再生楽曲を選択するための入力手段と、選択された再生楽曲の楽曲データを取得する楽曲データ取得手段と、選択された再生楽曲を指定する情報を前記サーバに送信し、前記音楽再生装置が利用可能と判定されたときに当該再生楽曲に対応する歌詞データを前記サーバから受信する第1の通信手段と、前記歌詞データに基づいて、歌詞音声データを生成する歌詞音声データ生成手段と、前記楽曲中の歌詞部分に先行するように、前記歌詞音声データを前記楽曲データに加算して歌詞音声付楽曲データを生成する歌詞音声付楽曲データ生成手段と、前記歌詞音声付楽曲データを、利用可能と判定された前記音楽再生装置に送信する第2の通信手段と、を備えることを特徴とする。 The invention according to claim 2 is a playback system including a server and a terminal device, wherein the server receives from the terminal device a storage unit that stores use permission information indicating an available music playback device. Based on the identification information of the music playback device and the use permission information, a determination unit that determines whether or not the music playback device can be used, an acquisition unit that acquires lyric data of music, and the determination unit Transmission means for transmitting to the terminal device the lyric data corresponding to the information specifying the reproduced music received from the terminal device when it is determined that the reproducing device is usable; Acquires identification information of the music playback device from the music playback device connected to the terminal device, selects identification information acquisition means for transmitting to the server, and selects a playback music that is to be played back Input means, music data acquisition means for acquiring music data of the selected playback music, and information specifying the selected playback music is transmitted to the server, and it is determined that the music playback device is usable Sometimes the first communication means for receiving the lyrics data corresponding to the reproduced music from the server, the lyrics voice data generating means for generating the lyrics voice data based on the lyrics data, and the lyrics part in the music As described above, the lyric sound-added music data generating means for adding the lyric sound data to the music data to generate the lyric sound-added music data, and the music that has been determined to be usable. And a second communication means for transmitting to the playback device.
 請求項3に記載の発明は、サーバと端末装置とを備える再生システムであって、前記サーバは、利用可能な音楽再生装置を特定する利用許可情報を記憶する記憶部と、前記端末装置から受信した音楽再生装置の識別情報と前記利用許可情報とに基づいて、前記音楽再生装置が利用可能か否かを判定する判定手段と、楽曲の楽曲データ及び当該楽曲の歌詞データを取得する取得手段と、前記歌詞データに基づいて、歌詞音声データを生成する歌詞音声データ生成手段と、前記楽曲中の歌詞部分に先行するように、前記歌詞音声データを前記楽曲データに加算して歌詞音声付楽曲データを生成する歌詞音声付楽曲データ生成手段と、前記判定手段により前記音楽再生装置が利用可能と判定されたときに、前記端末装置から受信した再生楽曲を指定する情報に対応する前記歌詞音声付楽曲データを前記端末装置へ送信する送信手段と、を備え、前記端末装置は、当該端末装置と接続された音楽再生装置から当該音楽再生装置の識別情報を取得し、前記サーバへ送信する識別情報取得手段と、再生すべき楽曲である再生楽曲を選択するための入力手段と、選択された再生楽曲を指定する情報を前記サーバに送信し、前記音楽再生装置が利用可能と判定されたときに当該再生楽曲に対応する歌詞音声付楽曲データを前記サーバから受信する第1の通信手段と、前記歌詞音声付楽曲データを、利用可能と判定された前記音楽再生装置に送信する第2の通信手段と、を備えることを特徴とする。 The invention according to claim 3 is a playback system including a server and a terminal device, wherein the server receives from the terminal device a storage unit that stores use permission information for specifying an available music playback device. Determination means for determining whether or not the music playback apparatus is usable based on the identification information of the music playback apparatus and the use permission information, and acquisition means for acquiring music data of the music and lyrics data of the music Lyric sound data generating means for generating lyric sound data based on the lyric data, and the lyric sound data is added to the tune data so as to precede the lyric part in the tune, and the tune data with lyric sound is added. Lyric sound-added music data generating means for generating the music data, and when the determination means determines that the music playback apparatus is usable, the playback music received from the terminal device is designated. Transmitting means for transmitting the song data with lyrics voice corresponding to the information to be transmitted to the terminal device, wherein the terminal device acquires identification information of the music playback device from the music playback device connected to the terminal device And transmitting the identification information acquisition means to be transmitted to the server, input means for selecting a reproduction music that is a music to be reproduced, and information specifying the selected reproduction music to the server, and the music reproduction device. The first communication means for receiving, from the server, music data with lyrics voice corresponding to the reproduced music when it is determined that the music data can be used, and the music playback for which the music data with lyrics voice is determined to be usable And a second communication means for transmitting to the apparatus.
 請求項7に記載の発明は、サーバと通信可能な端末装置であって、当該端末装置と接続された再生装置から当該再生装置の識別情報を取得し、前記サーバへ送信する識別情報取得手段と、再生すべきコンテンツを指定する情報を前記サーバに送信し、前記再生装置が利用可能と判定されたときに当該コンテンツに対応するコンテンツデータを前記サーバから受信する第1の通信手段と、前記コンテンツデータを、利用可能と判定された前記再生装置に送信する第2の通信手段と、を備えることを特徴とする。 The invention according to claim 7 is a terminal device capable of communicating with the server, and obtains identification information of the reproduction device from the reproduction device connected to the terminal device, and transmits the identification information to the server. A first communication means for transmitting information specifying content to be reproduced to the server, and receiving content data corresponding to the content from the server when the reproduction device is determined to be usable; And a second communication means for transmitting data to the playback device determined to be usable.
 請求項8に記載の発明は、サーバと通信可能な端末装置によって実行されるコンテンツ通信方法であって、当該端末装置と接続された再生装置から当該再生装置の識別情報を取得し、前記サーバへ送信する識別情報取得工程と、再生すべきコンテンツを指定する情報を前記サーバに送信し、前記再生装置が利用可能と判定されたときに当該コンテンツに対応するコンテンツデータを前記サーバから受信する第1の通信工程と、前記コンテンツデータを、利用可能と判定された前記再生装置に送信する第2の通信工程と、を備えることを特徴とする。 The invention according to claim 8 is a content communication method executed by a terminal device capable of communicating with a server, acquires identification information of the playback device from a playback device connected to the terminal device, and sends the identification information to the server. The identification information acquisition step to be transmitted, information specifying the content to be played back is sent to the server, and content data corresponding to the content is received from the server when the playback device is determined to be usable. And a second communication step of transmitting the content data to the playback device determined to be usable.
 請求項9に記載の発明は、サーバと通信可能な端末装置によって実行されるプログラムであって、当該端末装置と接続された再生装置から当該再生装置の識別情報を取得し、前記サーバへ送信する識別情報取得手段、再生すべきコンテンツを指定する情報を前記サーバに送信し、前記再生装置が利用可能と判定されたときに当該コンテンツに対応するコンテンツデータを前記サーバから受信する第1の通信手段、前記コンテンツデータを、利用可能と判定された前記再生装置に送信する第2の通信手段、として前記端末装置を機能させることを特徴とする。 The invention according to claim 9 is a program executed by a terminal device capable of communicating with a server, acquires identification information of the playback device from a playback device connected to the terminal device, and transmits the identification information to the server Identification information acquisition means, first communication means for transmitting information specifying content to be played back to the server and receiving content data corresponding to the content from the server when it is determined that the playback device is usable The terminal device is caused to function as a second communication means for transmitting the content data to the playback device determined to be usable.
 請求項10に記載の発明は、サーバと端末装置と通信可能なサーバであって、利用可能な再生装置を特定する利用許可情報を記憶する記憶部と、前記端末装置から当該端末装置と接続された再生装置の識別情報と、コンテンツを指定する情報を受信する受信手段と、前記識別情報と前記利用許可情報とに基づいて、前記再生装置が利用可能か否かを判定する判定手段と、前記判定手段により前記再生装置が利用可能と判定されたときに、前記コンテンツを指定する情報に対応するコンテンツデータを前記端末装置へ送信する送信手段と、を備えることを特徴とする。 The invention according to claim 10 is a server capable of communicating with the server and the terminal device, and is connected to the storage device storing use permission information for specifying a usable playback device, and the terminal device is connected to the terminal device. Receiving means for receiving identification information of the playback apparatus, information specifying the content, determination means for determining whether or not the playback apparatus can be used based on the identification information and the usage permission information, A transmission unit configured to transmit content data corresponding to information specifying the content to the terminal device when the determination unit determines that the reproduction apparatus is usable.
アシストボーカルの概念を示す図である。It is a figure which shows the concept of assist vocal. アシストボーカル処理のフローチャートである。It is a flowchart of an assist vocal process. スピーチ情報生成処理のフローチャートである。It is a flowchart of a speech information generation process. スピーチ情報生成処理の概要を示す。An overview of the speech information generation process is shown. 歌詞ブロック化の例を示す。An example of lyric blocking is shown. スピーチ挿入方法の例を示す。An example of a speech insertion method is shown. スピーチ強調処理の例を示す。An example of speech enhancement processing is shown. スピーチ強調処理の他の例に係る構成を示す。The structure which concerns on the other example of a speech emphasis process is shown. スピーチ強調処理の他の例に係る構成を示す。The structure which concerns on the other example of a speech emphasis process is shown. 楽曲再生システムの全体構成を示すブロック図である。It is a block diagram which shows the whole structure of a music reproduction system. 端末装置の内部構成例を示すブロック図である。It is a block diagram which shows the internal structural example of a terminal device. 第1実施例の楽曲再生システムによるアシストボーカル処理のフローチャートである。It is a flowchart of the assist vocal process by the music reproduction system of 1st Example. 第2実施例の楽曲再生システムによるアシストボーカル処理のフローチャートである。It is a flowchart of the assist vocal process by the music reproduction system of 2nd Example. スピーチのみを再生するアシストボーカル処理のフローチャートである。It is a flowchart of the assist vocal process which reproduces | regenerates only speech. 外部ソースにより再生されている楽曲の特定方法を説明する図である。It is a figure explaining the identification method of the music currently reproduced | regenerated by the external source. アシストボーカルの利用制限の方法を示す。Describes how to restrict the use of assist vocals. 利用可否チェック処理のフローチャートである。It is a flowchart of an availability check process. 合唱機能のための録音データを生成する環境を示す。An environment for generating recording data for the choral function is shown. 歌声データの生成方法を示すShows how to create singing voice data 歌声データの生成処理のフローチャートである。It is a flowchart of the production | generation process of singing voice data. 歌声データの生成処理のフローチャートである。It is a flowchart of the production | generation process of singing voice data. 合唱処理のフローチャートである。It is a flowchart of a choral process. 合唱処理のフローチャートである。It is a flowchart of choral processing.
 本発明の他の好適な実施形態では、サーバと端末装置とを備える通信システムにおいて、前記サーバは、利用可能な再生装置を特定する利用許可情報を記憶する記憶部と、前記端末装置から受信した再生装置の識別情報と前記利用許可情報とに基づいて、前記再生装置が利用可能か否かを判定する判定手段と、前記判定手段により前記再生装置が利用可能と判定されたときに、前記端末装置から受信したコンテンツを指定する情報に対応するコンテンツデータを前記端末装置へ送信する送信手段と、を備え、前記端末装置は、当該端末装置と接続された再生装置から当該再生装置の識別情報を取得し、前記サーバへ送信する識別情報取得手段と、再生すべきコンテンツを指定する情報を前記サーバに送信し、前記再生装置が利用可能と判定されたときに当該コンテンツに対応するコンテンツデータを前記サーバから受信する第1の通信手段と、前記コンテンツデータを、利用可能と判定された前記再生装置に送信する第2の通信手段と、を備える。 In another preferred embodiment of the present invention, in a communication system including a server and a terminal device, the server receives from the terminal device a storage unit that stores use permission information for specifying a usable playback device. A determination unit configured to determine whether or not the reproduction device is usable based on the identification information of the reproduction device and the use permission information; and when the determination unit determines that the reproduction device is usable, the terminal Transmitting means for transmitting content data corresponding to information specifying content received from the device to the terminal device, wherein the terminal device obtains identification information of the playback device from the playback device connected to the terminal device. ID information acquisition means for acquiring and transmitting to the server, and information for specifying the content to be reproduced is transmitted to the server, and the reproduction apparatus is determined to be usable A first communication unit that receives content data corresponding to the content from the server when transmitted, and a second communication unit that transmits the content data to the playback device determined to be usable. .
 上記の通信システムでは、サーバは、利用可能な再生装置を特定する利用許可情報を記憶している。端末装置は、当該端末装置と接続された再生装置から当該再生装置の識別情報を取得し、サーバへ送信するとともに、再生すべきコンテンツを指定する情報をサーバに送信する。サーバは、端末装置から受信した再生装置の識別情報と利用許可情報とに基づいて再生装置が利用可能か否かを判定し、再生装置が利用可能と判定されたときに、端末装置から受信したコンテンツを指定する情報に対応するコンテンツデータを端末装置へ送信する。端末装置は、サーバから受信したコンテンツデータを、利用可能と判定された再生装置に送信する。これにより、利用可能と判定された再生装置に対してのみコンテンツデータが送信されるようになる。 In the communication system described above, the server stores use permission information that identifies usable playback devices. The terminal device acquires identification information of the reproduction device from the reproduction device connected to the terminal device, transmits the identification information to the server, and transmits information specifying the content to be reproduced to the server. The server determines whether or not the playback device can be used based on the playback device identification information and use permission information received from the terminal device, and when the playback device is determined to be available, the server receives the playback device from the terminal device. Content data corresponding to information specifying the content is transmitted to the terminal device. The terminal device transmits the content data received from the server to the playback device determined to be usable. As a result, the content data is transmitted only to the playback device determined to be usable.
 本発明の好適な実施形態は、サーバと端末装置とを備える再生システムであって、前記サーバは、利用可能な音楽再生装置を示す利用許可情報を記憶する記憶部と、前記端末装置から受信した音楽再生装置の識別情報と前記利用許可情報とに基づいて、前記音楽再生装置が利用可能か否かを判定する判定手段と、楽曲の歌詞データを取得する取得手段と、前記判定手段により前記音楽再生装置が利用可能と判定されたときに、前記端末装置から受信した再生楽曲を指定する情報に対応する前記歌詞データを前記端末装置へ送信する送信手段と、を備え、前記端末装置は、当該端末装置と接続された音楽再生装置から当該音楽再生装置の識別情報を取得し、前記サーバへ送信する識別情報取得手段と、再生すべき楽曲である再生楽曲を選択するための入力手段と、選択された再生楽曲の楽曲データを取得する楽曲データ取得手段と、選択された再生楽曲を指定する情報を前記サーバに送信し、前記音楽再生装置が利用可能と判定されたときに当該再生楽曲に対応する歌詞データを前記サーバから受信する第1の通信手段と、前記歌詞データに基づいて、歌詞音声データを生成する歌詞音声データ生成手段と、前記楽曲中の歌詞部分に先行するように、前記歌詞音声データを前記楽曲データに加算して歌詞音声付楽曲データを生成する歌詞音声付楽曲データ生成手段と、前記歌詞音声付楽曲データを、利用可能と判定された前記音楽再生装置に送信する第2の通信手段と、を備える。 A preferred embodiment of the present invention is a playback system including a server and a terminal device, and the server receives from the terminal device a storage unit that stores use permission information indicating an available music playback device. Based on the identification information of the music playback device and the use permission information, a determination unit that determines whether or not the music playback device can be used, an acquisition unit that acquires lyric data of music, and the determination unit Transmission means for transmitting to the terminal device the lyric data corresponding to the information specifying the reproduced music received from the terminal device when it is determined that the reproducing device is usable; Acquires identification information of the music playback device from the music playback device connected to the terminal device, selects identification information acquisition means for transmitting to the server, and selects the playback music that is to be played back Input means, music data acquisition means for acquiring music data of the selected playback music, and information specifying the selected playback music is transmitted to the server, and the music playback device is determined to be usable Sometimes the first communication means for receiving the lyrics data corresponding to the reproduced music from the server, the lyrics voice data generating means for generating the lyrics voice data based on the lyrics data, and the lyrics part in the music As described above, the lyric sound-added music data generating means for adding the lyric sound data to the music data to generate the lyric sound-added music data, and the music that has been determined to be usable. Second communication means for transmitting to the playback device.
 上記の再生システムでは、サーバは、利用可能な音楽再生装置を示す利用許可情報を記憶している。端末装置は、当該端末装置と接続された音楽再生装置から当該音楽再生装置の識別情報を取得してサーバへ送信する。サーバは、端末装置から受信した音楽再生装置の識別情報と利用許可情報とに基づいて音楽再生装置が利用可能か否かを判定する。 In the above playback system, the server stores use permission information indicating the music playback devices that can be used. The terminal device acquires identification information of the music playback device from the music playback device connected to the terminal device and transmits the identification information to the server. The server determines whether or not the music playback device can be used based on the music playback device identification information and use permission information received from the terminal device.
 また、端末装置においては、ユーザにより再生すべき楽曲である再生楽曲が選択される。端末装置は、選択された再生楽曲を指定する情報をサーバに送信する。サーバは、音楽再生装置が利用可能と判定されたときに、端末装置から受信した再生楽曲を指定する情報に対応する歌詞データを取得して端末装置へ送信する。端末装置は、音楽再生装置が利用可能と判定されたときに再生楽曲に対応する歌詞データをサーバから受信し、歌詞データに基づいて歌詞音声データを生成する。そして、端末装置は、楽曲中の歌詞部分に先行するように、歌詞音声データを楽曲データに加算して歌詞音声付楽曲データを生成し、利用可能と判定された音楽再生装置に送信して再生させる。こうして、利用可能な音楽再生装置とともに動作する端末装置に対してのみサーバから歌詞データが提供され、楽曲データとともに再生することが可能となる。 Also, in the terminal device, the playback music that is the music to be played back is selected by the user. The terminal device transmits information specifying the selected reproduction music piece to the server. When it is determined that the music playback device can be used, the server acquires lyrics data corresponding to information specifying the playback music received from the terminal device, and transmits the lyrics data to the terminal device. The terminal device receives lyric data corresponding to the reproduced music from the server when it is determined that the music reproducing device can be used, and generates lyric audio data based on the lyric data. Then, the terminal device adds the lyrics voice data to the song data to generate the song data with the lyrics voice so as to precede the lyrics portion in the song, and transmits it to the music playback device determined to be usable for playback. Let In this way, the lyrics data is provided from the server only to the terminal device operating together with the available music playback device, and can be played back together with the music data.
 本発明の他の好適な実施形態は、サーバと端末装置とを備える再生システムであって、前記サーバは、利用可能な音楽再生装置を特定する利用許可情報を記憶する記憶部と、前記端末装置から受信した音楽再生装置の識別情報と前記利用許可情報とに基づいて、前記音楽再生装置が利用可能か否かを判定する判定手段と、楽曲の楽曲データ及び当該楽曲の歌詞データを取得する取得手段と、前記歌詞データに基づいて、歌詞音声データを生成する歌詞音声データ生成手段と、前記楽曲中の歌詞部分に先行するように、前記歌詞音声データを前記楽曲データに加算して歌詞音声付楽曲データを生成する歌詞音声付楽曲データ生成手段と、前記判定手段により前記音楽再生装置が利用可能と判定されたときに、前記端末装置から受信した再生楽曲を指定する情報に対応する前記歌詞音声付楽曲データを前記端末装置へ送信する送信手段と、を備え、前記端末装置は、当該端末装置と接続された音楽再生装置から当該音楽再生装置の識別情報を取得し、前記サーバへ送信する識別情報取得手段と、再生すべき楽曲である再生楽曲を選択するための入力手段と、選択された再生楽曲を指定する情報を前記サーバに送信し、前記音楽再生装置が利用可能と判定されたときに当該再生楽曲に対応する歌詞音声付楽曲データを前記サーバから受信する第1の通信手段と、前記歌詞音声付楽曲データを、利用可能と判定された前記音楽再生装置に送信する第2の通信手段と、を備える。 Another preferred embodiment of the present invention is a playback system including a server and a terminal device, wherein the server stores use permission information for specifying an available music playback device, and the terminal device. Based on the identification information of the music playback device received from the above and the use permission information, the determination means for determining whether or not the music playback device can be used, and the acquisition of the music data of the music and the lyrics data of the music Means, lyric sound data generating means for generating lyric sound data based on the lyric data, and adding the lyric sound data to the music data so as to precede the lyric part in the music Music data generating means with lyric sound for generating music data, and playback music received from the terminal device when the determination means determines that the music playback apparatus is usable; Transmitting means for transmitting the song data with lyrics voice corresponding to the information for designating to the terminal device, the terminal device from the music playback device connected to the terminal device identification information of the music playback device ID information acquiring means for transmitting the information to the server, input means for selecting the reproduced music that is the music to be reproduced, and information specifying the selected reproduced music are transmitted to the server, and the music First communication means for receiving, from the server, music data with lyrics audio corresponding to the reproduced music when it is determined that the playback device is usable, and the music data with lyrics audio determined to be available Second communication means for transmitting to the music playback device.
 上記の再生システムでは、サーバは、利用可能な音楽再生装置を特定する利用許可情報を記憶している。端末装置は、当該端末装置と接続された音楽再生装置から当該音楽再生装置の識別情報を取得し、サーバへ送信する。サーバは、端末装置から受信した音楽再生装置の識別情報と利用許可情報とに基づいて、音楽再生装置が利用可能か否かを判定する。 In the above playback system, the server stores use permission information for specifying an available music playback device. The terminal device acquires identification information of the music playback device from the music playback device connected to the terminal device, and transmits the identification information to the server. The server determines whether or not the music playback device can be used based on the music playback device identification information and use permission information received from the terminal device.
 また、サーバは、楽曲の楽曲データ及び当該楽曲の歌詞データを取得し、歌詞データに基づいて歌詞音声データを生成し、楽曲中の歌詞部分に先行するように歌詞音声データを楽曲データに加算して歌詞音声付楽曲データを生成する。 In addition, the server acquires the song data of the song and the lyrics data of the song, generates the lyrics voice data based on the lyrics data, and adds the lyrics voice data to the song data so as to precede the lyrics portion in the song. To generate song data with lyrics audio.
 端末装置は、ユーザから再生すべき楽曲である再生楽曲の選択を受け取り、選択された再生楽曲を指定する情報をサーバに送信する。サーバは、音楽再生装置が利用可能と判定されたときに、端末装置から受信した再生楽曲を指定する情報に対応する歌詞音声付楽曲データを端末装置へ送信する。端末装置は、音楽再生装置が利用可能と判定されたときに再生楽曲に対応する歌詞音声付楽曲データをサーバから受信し、利用可能と判定された音楽再生装置に送信して再生させる。こうして、利用可能な音楽再生装置とともに動作する端末装置に対してのみサーバから歌詞データが提供され、楽曲データとともに再生することが可能となる。 The terminal device receives selection of a reproduction music that is a music to be reproduced from the user, and transmits information specifying the selected reproduction music to the server. When it is determined that the music playback device can be used, the server transmits music data with lyrics audio corresponding to information specifying the playback music received from the terminal device to the terminal device. When it is determined that the music playback device is usable, the terminal device receives music data with lyrics audio corresponding to the playback music from the server, and transmits the music data to the music playback device determined to be usable for playback. In this way, the lyrics data is provided from the server only to the terminal device operating together with the available music playback device, and can be played back together with the music data.
 上記の再生システムの一態様では、前記第2の通信手段は、前記歌詞音声付楽曲データを前記音楽再生装置に送信する前に、当該音楽再生装置の識別情報を受信し、受信した識別情報に基づき当該音楽再生装置が前記サーバで利用可能と判定された音楽再生装置であると再判定されたときは前記歌詞音声付楽曲データを前記識別情報を受信した音楽再生装置に送信し、受信した識別情報に基づき当該音楽再生装置が前記サーバで利用可能と判定された音楽再生装置でないと再判定されたときは前記歌詞音声付楽曲データを前記識別情報を受信した音楽再生装置に送信しない。この態様では、端末装置は、音楽再生装置に歌詞音声付楽曲データ送信する前に、当該音楽再生装置が利用可能であるか否かを再判定する。 In one aspect of the above playback system, the second communication means receives the identification information of the music playback device before transmitting the song data with lyrics sound to the music playback device, and the received identification information When the music playback device is determined again as a music playback device that is determined to be usable by the server, the music data with lyrics voice is transmitted to the music playback device that has received the identification information, and the received identification is received. When it is determined again that the music playback device is not a music playback device that is determined to be usable by the server based on the information, the music data with lyrics voice is not transmitted to the music playback device that has received the identification information. In this aspect, the terminal device re-determines whether or not the music playback device can be used before transmitting the song data with lyrics voice to the music playback device.
 上記の楽曲再生システムの他の一態様では、前記記憶部は、前記利用許可情報として、利用可能な音楽再生装置の識別情報を記憶しており、前記判定手段は、前記端末装置から受信した識別情報と同一の識別情報が前記記憶部に記憶されている場合に、前記音楽再生装置を利用可能と判定する。 In another aspect of the music reproduction system, the storage unit stores identification information of an available music reproduction device as the use permission information, and the determination unit receives the identification received from the terminal device. When the same identification information as the information is stored in the storage unit, it is determined that the music playback device can be used.
 上記の楽曲再生システムの他の一態様では、前記記憶部は、前記利用許可情報として、所定の利用許可コードを記憶しており、前記判定手段は、前記端末装置から受信した識別情報に前記利用許可コードが含まれている場合に、前記音楽再生装置を利用可能と判定する。 In another aspect of the music reproduction system, the storage unit stores a predetermined use permission code as the use permission information, and the determination unit uses the identification information received from the terminal device as the use information. When the permission code is included, it is determined that the music playback device can be used.
 本発明の他の好適な実施形態では、サーバと通信可能な端末装置は、当該端末装置と接続された再生装置から当該再生装置の識別情報を取得し、前記サーバへ送信する識別情報取得手段と、再生すべきコンテンツを指定する情報を前記サーバに送信し、前記再生装置が利用可能と判定されたときに当該コンテンツに対応するコンテンツデータを前記サーバから受信する第1の通信手段と、前記コンテンツデータを、利用可能と判定された前記再生装置に送信する第2の通信手段と、を備える。 In another preferred embodiment of the present invention, a terminal device capable of communicating with a server acquires identification information of the playback device from a playback device connected to the terminal device, and transmits identification information to the server. A first communication means for transmitting information specifying content to be reproduced to the server, and receiving content data corresponding to the content from the server when the reproduction device is determined to be usable; Second communication means for transmitting data to the playback device determined to be usable.
 上記の端末装置は、当該端末装置と接続された再生装置から当該再生装置の識別情報を取得し、サーバへ送信する。また、再生すべきコンテンツを指定する情報をサーバに送信する。そして、端末装置は、再生装置が利用可能と判定されたときに当該コンテンツに対応するコンテンツデータをサーバから受信し、利用可能と判定された前記再生装置に送信する。これにより、利用可能と判定された再生装置に対してのみコンテンツデータが送信されるようになる。 The above terminal device acquires the identification information of the playback device from the playback device connected to the terminal device, and transmits it to the server. In addition, information specifying content to be reproduced is transmitted to the server. Then, when it is determined that the playback device is usable, the terminal device receives content data corresponding to the content from the server, and transmits the content data to the playback device determined to be usable. As a result, the content data is transmitted only to the playback device determined to be usable.
 本発明の他の好適な実施形態では、サーバと通信可能な端末装置によって実行されるコンテンツ通信方法は、当該端末装置と接続された再生装置から当該再生装置の識別情報を取得し、前記サーバへ送信する識別情報取得工程と、再生すべきコンテンツを指定する情報を前記サーバに送信し、前記再生装置が利用可能と判定されたときに当該コンテンツに対応するコンテンツデータを前記サーバから受信する第1の通信工程と、前記コンテンツデータを、利用可能と判定された前記再生装置に送信する第2の通信工程と、を備える。この方法により、利用可能と判定された再生装置に対してのみコンテンツデータが送信されるようになる。 In another preferred embodiment of the present invention, a content communication method executed by a terminal device capable of communicating with a server acquires identification information of the playback device from a playback device connected to the terminal device, and sends the identification information to the server. The identification information acquisition step to be transmitted, information specifying the content to be played back is sent to the server, and content data corresponding to the content is received from the server when the playback device is determined to be usable. And a second communication step of transmitting the content data to the playback device determined to be usable. By this method, the content data is transmitted only to the playback device determined to be usable.
 本発明の他の好適な実施形態では、サーバと通信可能な端末装置によって実行されるプログラムは、当該端末装置と接続された再生装置から当該再生装置の識別情報を取得し、前記サーバへ送信する識別情報取得手段、再生すべきコンテンツを指定する情報を前記サーバに送信し、前記再生装置が利用可能と判定されたときに当該コンテンツに対応するコンテンツデータを前記サーバから受信する第1の通信手段、前記コンテンツデータを、利用可能と判定された前記再生装置に送信する第2の通信手段、として前記端末装置を機能させる。このプログラムを実行することにより、利用可能と判定された再生装置に対してのみコンテンツデータが送信されるようになる。 In another preferred embodiment of the present invention, a program executed by a terminal device capable of communicating with a server acquires identification information of the playback device from a playback device connected to the terminal device, and transmits the identification information to the server. Identification information acquisition means, first communication means for transmitting information specifying content to be played back to the server and receiving content data corresponding to the content from the server when it is determined that the playback device is usable The terminal device is caused to function as second communication means for transmitting the content data to the playback device determined to be usable. By executing this program, the content data is transmitted only to the playback device determined to be usable.
 本発明の他の好適な実施形態では、サーバと端末装置と通信可能なサーバは、利用可能な再生装置を特定する利用許可情報を記憶する記憶部と、前記端末装置から当該端末装置と接続された再生装置の識別情報と、コンテンツを指定する情報を受信する受信手段と、前記識別情報と前記利用許可情報とに基づいて、前記再生装置が利用可能か否かを判定する判定手段と、前記判定手段により前記再生装置が利用可能と判定されたときに、前記コンテンツを指定する情報に対応するコンテンツデータを前記端末装置へ送信する送信手段と、を備える。このサーバにより、利用可能と判定された再生装置に対してのみコンテンツデータが送信されるようになる。 In another preferred embodiment of the present invention, a server capable of communicating with a server and a terminal device is connected to the terminal device from the terminal device and a storage unit that stores use permission information for specifying a usable playback device. Receiving means for receiving identification information of the playback apparatus, information specifying the content, determination means for determining whether or not the playback apparatus can be used based on the identification information and the usage permission information, Transmission means for transmitting content data corresponding to information designating the content to the terminal device when it is determined by the determination means that the playback device is usable. By this server, the content data is transmitted only to the playback device determined to be usable.
 以下、図面を参照して本発明の好適な実施例について説明する。 Hereinafter, preferred embodiments of the present invention will be described with reference to the drawings.
 [1]アシストボーカル
 [1.1]アシストボーカルの概念
 車両を運転しているユーザが車内で音楽を再生して聞いている際、聞いている曲を歌いたくなることがある。しかし、運転中は歌詞の情報を見ることができないため、ユーザはその曲の歌詞を記憶していないと歌うことはできない。
[1] Assist Vocal [1.1] Concept of Assist Vocal When a user who is driving a vehicle plays and listens to music in the car, he may want to sing the song he is listening to. However, since the lyrics information cannot be seen while driving, the user cannot sing unless the lyrics of the song are stored.
 本実施例では、歌詞を含む楽曲を再生している際に、その楽曲に含まれる歌詞を音声信号として出力し、ユーザに教える。具体的には、端末装置のメモリなどに記憶されている楽曲を再生している際に、その楽曲に含まれる歌詞を、その歌詞がその楽曲中で再生される前に、音声として出力してユーザに伝える。これにより、ユーザは、運転中であっても、再生中の曲を歌うことができる。また、運転手以外のユーザも、歌詞集などを見ることなく曲を歌うことができる。 In this embodiment, when a song containing lyrics is played, the lyrics contained in the song are output as an audio signal to teach the user. Specifically, when playing a song stored in the memory of the terminal device, the lyrics included in the song are output as audio before the lyrics are played in the song. Tell the user. Thereby, the user can sing the music being reproduced even during driving. Also, users other than the driver can sing songs without looking at the lyrics collection.
 このように、楽曲中で歌詞が再生されるタイミングに先行して、その歌詞の内容を音声出力してユーザに伝える機能を「アシストボーカル」と呼ぶ。なお、本実施例では、再生される楽曲はカラオケではなく、歌詞を含む通常の曲であるものとする。 In this way, the function of outputting the contents of the lyrics and transmitting them to the user prior to the timing when the lyrics are reproduced in the music is called “assist vocal”. In this embodiment, it is assumed that the music to be played is not a karaoke but a normal music including lyrics.
 図1は、アシストボーカルの概念を示す。図1は、1つの楽曲を模式的に示したものである。図1の横軸は時間を示す。1つの楽曲中には、複数のブロックに分けて歌詞の部分が含まれている。再生される楽曲に含まれる歌詞の部分を「ボーカル」と呼ぶ。また、楽曲中において、ボーカル以外の部分を「間奏」と呼ぶ。よって、通常1つの楽曲は、複数の間奏と複数のボーカルとにより構成される。 Figure 1 shows the concept of assist vocals. FIG. 1 schematically shows one piece of music. The horizontal axis in FIG. 1 indicates time. One piece of music includes a lyrics portion divided into a plurality of blocks. The part of the lyrics included in the music to be played is called “vocal”. Also, in the music, the part other than vocals is called “interlude”. Therefore, usually one piece of music is composed of a plurality of interludes and a plurality of vocals.
 図1の例では、楽曲は、3つのボーカル1~3と、複数の間奏とにより構成されている。ボーカル1の内容(歌詞)は「あいうえお」であり、ボーカル2の内容は「かきくけこ」であり、ボーカル3の内容は「さしすせそ」であるものとする。 In the example of FIG. 1, the music is composed of three vocals 1 to 3 and a plurality of interludes. It is assumed that the content (lyrics) of the vocal 1 is “Aiueo”, the content of the vocal 2 is “Kakikukeko”, and the content of the vocal 3 is “Sashisuseso”.
 このような楽曲が再生されている状況において、本実施例では、楽曲中のボーカル1が再生されるタイミングに先行して、ボーカル1に対応する歌詞「あいうえお」が音声出力される。なお、本明細書では、アシストボーカルにより音声出力される歌詞音声を「スピーチ」と呼んで、楽曲中に含まれる「ボーカル」と区別する。 In a situation where such a music is being played back, in the present embodiment, the lyrics “Aiueo” corresponding to the vocal 1 are output as audio prior to the timing at which the vocal 1 in the music is played back. In the present specification, the lyric sound output by the assist vocal is called “speech” and is distinguished from “vocal” included in the music.
 図1の例では、ボーカル1に先行して、ボーカル1に対応するスピーチ1が出力される。同様に、ボーカル2に先行してスピーチ2が出力され、ボーカル3に先行してスピーチ3が出力される。 In the example of FIG. 1, speech 1 corresponding to vocal 1 is output prior to vocal 1. Similarly, speech 2 is output prior to vocal 2, and speech 3 is output prior to vocal 3.
 スピーチは、曲に含まれるボーカルの歌詞のみを音声信号として出力するものであり、基本的に音程やリズムなどの要素を含まない。また、後述するように、スピーチは基本的に対応するボーカルの前の間奏に挿入されるので、必要に応じてその長さが調整され、通常は楽曲の再生中にボーカルとして再生される場合よりも短い時間とされる。典型的な例では、スピーチは対応するボーカルの歌詞を早口で話した音声となる。 The speech outputs only the vocal lyrics included in the song as an audio signal, and basically does not include elements such as pitch and rhythm. Also, as will be described later, the speech is basically inserted into the interlude before the corresponding vocal, so the length is adjusted as necessary, and it is usually played back as a vocal during the playback of a song. It is also a short time. In a typical example, the speech is spoken speech of the corresponding vocal lyrics.
 [1.2]アシストボーカル処理
 次に、スピーチを出力するためのアシストボーカル処理について説明する。図2は、アシストボーカル処理のフローチャートである。なお、この処理は、車両に搭載された端末装置、典型的にはスマートフォンなどの携帯端末などにより実行されるが、その詳細については後述する。以下の説明では、端末装置が処理を実行するものとして説明する。
[1.2] Assist Vocal Processing Next, assist vocal processing for outputting speech will be described. FIG. 2 is a flowchart of the assist vocal process. This process is executed by a terminal device mounted on the vehicle, typically a mobile terminal such as a smartphone, and the details thereof will be described later. In the following description, it is assumed that the terminal device executes processing.
 まず、端末装置は、アシストボーカルがオンになっているか否かを判定する(ステップS1)。ここで、アシストボーカルのオン/オフは、ユーザが手動で行う場合と、自動で行う場合とがある。手動で行う場合、ユーザはアシストボーカルによりスピーチの再生を行いたいときに所定のボタンなどを操作してアシストボーカルをオンに設定し、端末装置はこれを検出する。一方、自動で行う場合、端末装置は例えばマイクなどを利用してユーザの声を判定し、ユーザが曲を歌唱している又は歌唱に準ずる行為を行っている場合に、自動的にアシストボーカルをオンに設定する。なお、アシストボーカルの自動設定方法についてはさらに後述する。 First, the terminal device determines whether or not the assist vocal is on (step S1). Here, the assist vocal may be turned on / off manually by the user or automatically. When performing manually, when a user wants to reproduce speech by assist vocal, the user operates a predetermined button or the like to turn on the assist vocal, and the terminal device detects this. On the other hand, when performing automatically, a terminal device judges a user's voice, for example using a microphone etc., and when a user is singing a song or performing the act according to a song, an assist vocal is automatically carried out. Set to on. The assist vocal automatic setting method will be described later.
 アシストボーカルがオンに設定されていない場合(ステップS1:No)、処理は終了する。一方、アシストボーカルがオンに設定されている場合(ステップS1:Yes)、端末装置は、再生中の楽曲を特定する(ステップS2)。この場合に、車内で再生されている楽曲は、サーバからダウンロードされるなどして端末装置の内部に記憶されている楽曲、CDや車載器のメモリなどの記憶媒体に記憶されている楽曲、ラジオなどから再生されている楽曲などを含む。端末装置の内部に記憶されている楽曲を再生している場合、端末装置はその再生中の楽曲を容易に特定することができる。一方、CDなどの記憶媒体に記憶されている楽曲が再生されている場合やラジオから楽曲が再生されている場合には、端末装置は、車内のスピーカから再生されている楽曲をマイクで集音し、そのオーディオデータを外部の音楽検索サーバに送信する。音楽検索サーバは、多数の楽曲のデータをデータベース化して記憶しており、端末装置から受信したオーディオデータと一致する楽曲を特定してその楽曲を示す情報(例えば、曲名、アーティスト名など、以下、「楽曲特定情報」と呼ぶ。)を端末装置に送信する。こうして、端末装置は、現在再生されている楽曲の楽曲特定情報を取得する。 If the assist vocal is not set to ON (step S1: No), the process ends. On the other hand, when the assist vocal is set to ON (step S1: Yes), the terminal device specifies the music being reproduced (step S2). In this case, the music played in the car is a music stored in the terminal device by downloading from the server, a music stored in a storage medium such as a CD or a memory of the vehicle-mounted device, a radio Music that is being played from When the music stored in the terminal device is being reproduced, the terminal device can easily specify the music being reproduced. On the other hand, when music stored on a storage medium such as a CD is being played or when music is being played from the radio, the terminal device collects the music being played from the speaker in the car with a microphone. Then, the audio data is transmitted to an external music search server. The music search server stores a large number of pieces of music data as a database, specifies music that matches the audio data received from the terminal device, and indicates the music (for example, music name, artist name, etc. (Referred to as “music identification information”) to the terminal device. In this way, the terminal device acquires the music specifying information of the music currently being played.
 こうして、再生中の楽曲が特定されると、端末装置は、スピーチ情報生成処理を実行する(ステップS3)。図3は、スピーチ情報生成処理のフローチャートである。また、図4は、スピーチ情報生成処理の概要を示す。 Thus, when the music being played is specified, the terminal device executes a speech information generation process (step S3). FIG. 3 is a flowchart of the speech information generation process. FIG. 4 shows an outline of the speech information generation process.
 図3において、端末装置は、ステップS2で特定された楽曲の歌詞データを外部サーバなどから取得する(ステップS31)。ここで、「歌詞データ」とは、その楽曲において、どのタイミングにどのような歌詞が再生されるかを規定する情報であり、具体的には、楽曲に含まれる歌詞を示す歌詞テキストデータと、その歌詞が再生される再生時刻(曲の開始時刻からの経過時間)を示す再生時刻データとを対応付けた情報である。 In FIG. 3, the terminal device acquires the lyrics data of the music specified in step S2 from an external server or the like (step S31). Here, the “lyric data” is information that defines what lyrics are reproduced at which timing in the music, specifically, lyrics text data indicating the lyrics included in the music, This is information in which the reproduction time data indicating the reproduction time (the elapsed time from the start time of the song) is associated with the reproduction time data.
 次に、端末装置は、楽曲解析データを取得する(ステップS32)。楽曲解析データとは、その楽曲における拍位置、小節位置などの音楽的特徴を示す情報であり、再生された楽曲のオーディオデータに基づいて生成される。具体的には、端末装置は内部に楽曲解析アプリケーションを内蔵しておき、車両のスピーカから再生された楽曲をマイクで集音してオーディオデータを取得し、そのオーディオデータを解析することにより拍位置などの楽曲解析データを取得する。なお、端末装置に楽曲解析アプリケーションを内蔵する代わりに、外部の楽曲解析装置やサーバなどを利用して楽曲解析データを取得してもよい。 Next, the terminal device acquires music analysis data (step S32). The music analysis data is information indicating musical features such as beat positions and bar positions in the music, and is generated based on the audio data of the reproduced music. Specifically, the terminal device has a built-in music analysis application, collects music played from a vehicle speaker with a microphone, acquires audio data, and analyzes the audio data to obtain beat positions. Acquire music analysis data such as. Note that the music analysis data may be acquired using an external music analysis device or server instead of incorporating the music analysis application in the terminal device.
 次に、端末装置は、歌詞ブロック化を行う(ステップS33)。歌詞ブロック化とは、ステップS31で取得した歌詞データに含まれる歌詞テキストデータをブロック化する処理であり、1つのブロックは、1つのスピーチに対応する。即ち、歌詞ブロック化は、歌詞テキストデータを、スピーチの単位に分割する処理である。 Next, the terminal device performs lyrics block formation (step S33). Lyrics blocking is a process of blocking lyrics text data included in the lyrics data acquired in step S31, and one block corresponds to one speech. That is, lyric blocking is a process of dividing lyric text data into speech units.
 図4の例では、端末装置は、歌詞テキストデータとして「あいうえおかきくけこさしすせそ」を取得しており、端末装置は、これを「あいうえお」、「かきくけこ」、「さしすせそ」の3つのブロックに分割してブロック歌詞データを生成する。 In the example of FIG. 4, the terminal device has acquired “Aiueokiki Kekoshisashisoseso” as the lyric text data, and the terminal device has obtained three blocks “Aiueo”, “Kakikukeko”, and “Sashisetsuso”. To generate block lyrics data.
 図5は、歌詞ブロック化の例を示す。図5(A)に第1の方法を示す。この方法では、楽曲に含まれる間奏と間奏との間を1つのブロックとする。なお、「間奏」は、楽曲のうち「ボーカル」以外の部分である。具体的には、端末装置は、ボーカル以外の区間(非ボーカル区間)の長さItが所定長さt1よりも長い場合に、その区間を間奏と判定する。 FIG. 5 shows an example of lyrics block. FIG. 5A shows a first method. In this method, the interval between the interludes included in the music is set as one block. The “interlude” is a part other than “vocal” in the music. Specifically, when the length It of a section other than vocal (non-vocal section) is longer than a predetermined length t1, the terminal apparatus determines that the section is an interlude.
 但し、例外的に、間奏の長さとの関係で複数のブロックを1つのブロックにまとめる場合がある。図5(B)に示す例のように、ボーカル3の長さVt3に対して、その直前の間奏2の長さIt2が非常に短い(It2<α1・Vt3;α1は任意の係数)場合、間奏2の間にボーカル3のスピーチを出力することは難しい。このような場合に、その1つの前の間奏1の長さIt1が所定長より長ければ、端末装置は、ボーカル2とボーカル3を1つのブロックとする。これにより、ボーカル2とボーカル3に対応するスピーチは間奏1においてされる。 However, there are exceptional cases where multiple blocks are combined into one block due to the length of the interlude. As in the example shown in FIG. 5B, the length It2 of the interlude 2 immediately before the length Vt3 of the vocal 3 is very short (It2 <α1 · Vt3; α1 is an arbitrary coefficient). It is difficult to output vocal 3 speech during interlude 2. In such a case, if the length It1 of the previous interlude 1 is longer than a predetermined length, the terminal device sets the vocal 2 and vocal 3 as one block. Thus, speech corresponding to vocal 2 and vocal 3 is made in interlude 1.
 図5(C)に第2の方法を示す。この方法では、端末装置は、歌詞データに含まれる区切りに基づいて各ブロックを決定する。即ち、歌詞データに含まれる歌詞テキストデータに予め区切りの情報が含まれている場合には、端末装置はその区切りに従って歌詞テキストデータをブロック化することができる。 Fig. 5 (C) shows the second method. In this method, the terminal device determines each block based on a break included in the lyrics data. That is, if the lyric text data included in the lyric data includes delimiter information in advance, the terminal device can block the lyric text data according to the delimiter.
 次に、端末装置は、歌詞スピーチ化を行う(ステップS34)。歌詞ブロック化により得られたブロック歌詞データはあくまで歌詞を示すテキストデータであり、歌詞スピーチ化はブロック歌詞データを音声データに変換する処理である。具体的には、端末装置は、テキスト-音声変換(TTS:TextToSpeech)ソフトウェアを内蔵し、ステップS33で得られた各ブロック歌詞データを音声データに変換する。これにより、図4に示すように、各ブロック歌詞データから、音声データであるスピーチ1~3が生成される。なお、端末装置にTTSソフトウェアを内蔵する代わりに、外部サーバなどによるTTS変換を利用してもよい。 Next, the terminal device performs lyrics speech (step S34). The block lyric data obtained by lyric blocking is text data indicating lyrics, and lyric speech is a process of converting block lyric data into audio data. Specifically, the terminal device incorporates text-to-speech (TTS: TextToSpeech) software, and converts each block lyrics data obtained in step S33 into speech data. As a result, as shown in FIG. 4, speech data 1 to 3 that are audio data are generated from each block lyrics data. Instead of incorporating TTS software in the terminal device, TTS conversion by an external server or the like may be used.
 次に、端末装置は、スピーチ長変更を行う(ステップS35)。スピーチ長変更とは、歌詞スピーチ化により得られた各スピーチの時間的な長さを短縮して、短い時間で再生できるようにする処理である。既に述べたように、各スピーチは対応するボーカルに先行する間奏において再生されるが、間奏の時間的な長さには制限があるので、スピーチを短くして再生する必要がある。このため、スピーチ長変更が行われる。 Next, the terminal device changes the speech length (step S35). The speech length change is a process for shortening the time length of each speech obtained by lyric speech so that it can be reproduced in a short time. As already described, each speech is reproduced in an interlude preceding the corresponding vocal. However, since there is a limitation on the time length of the interlude, it is necessary to reproduce the speech by shortening it. For this reason, the speech length is changed.
 基本的には、人間により聞き取り可能な範囲で、各スピーチの再生時間を短く(再生速度を速く)する。例えば、ステップS34で得られた各スピーチの時間的な長さ(「オリジナルスピーチ長」と呼ぶ。)を「St」とし、スピーチ長変換係数を「α2」とすると、スピーチ長変更による変更後の長さ「Stv」は、
    Stv=St・α2 (α2<1.0)          (1)
で与えられる。例えば、α2=0.7とすれば、スピーチ長変更により各スピーチは元の3割増しの速さで再生されることになる。
Basically, the playback time of each speech is shortened (the playback speed is increased) within a range that can be heard by humans. For example, when the time length of each speech obtained in step S34 (referred to as “original speech length”) is “St” and the speech length conversion coefficient is “α2”, the speech length is changed by changing the speech length. The length “Stv” is
Stv = St · α2 (α2 <1.0) (1)
Given in. For example, if α2 = 0.7, each speech is reproduced at a rate 30% higher than the original by changing the speech length.
 また、上記のような一括変更に加えて、各スピーチ毎に対応する間奏の時間に応じてさらに再生時間を短くしてもよい。なお、この場合、同じ文字数のスピーチ、又は、同じ歌詞の言葉であっても、曲中の位置(先行する間奏の長さ)に応じて、再生時間が異なることになる。 In addition to the batch change as described above, the playback time may be further shortened according to the duration of the interlude corresponding to each speech. In this case, even for speech with the same number of characters or words with the same lyrics, the playback time varies depending on the position in the song (the length of the preceding interlude).
 次に、端末装置は、スピーチ挿入タイミングを算出する(ステップS36)。端末装置は、あるボーカルに対応するスピーチを、そのボーカルの再生タイミングに先行して挿入する。図4に示す例では、ボーカル1に対応するスピーチ1はボーカルの再生タイミングより前に挿入される。同様に、ボーカル2に対応するスピーチ2はボーカル2の再生タイミングより前に挿入され、ボーカル3に対応するスピーチ3はボーカル3の再生タイミングより前に挿入される。 Next, the terminal device calculates the speech insertion timing (step S36). The terminal device inserts speech corresponding to a certain vocal prior to the playback timing of the vocal. In the example shown in FIG. 4, the speech 1 corresponding to the vocal 1 is inserted before the reproduction timing of the vocal. Similarly, the speech 2 corresponding to the vocal 2 is inserted before the reproduction timing of the vocal 2, and the speech 3 corresponding to the vocal 3 is inserted before the reproduction timing of the vocal 3.
 スピーチを挿入する方法の具体例を図6に示す。図6は、ボーカル2に対応するスピーチ2を挿入するタイミングの例を示す。 A specific example of a method for inserting speech is shown in FIG. FIG. 6 shows an example of the timing at which the speech 2 corresponding to the vocal 2 is inserted.
 方法1では、スピーチは、対応するボーカルの開始タイミングよりも一定時間前に終了する。具体的に、図6に示すように、スピーチ2は、ボーカル2の再生開始タイミングより一定時間T2前に終了するように挿入される。即ち、スピーチ2はボーカル2の再生開始より一定時間T2前に終了する。この場合、スピーチ2の再生開始タイミングはスピーチ2の長さに応じて決まる。方法1では、スピーチの再生が終了してから、対応するボーカルが再生されるまでに一定時間が確保されるので、ユーザは余裕を持ってボーカル部分を歌うことができる。 In Method 1, the speech ends a certain time before the start timing of the corresponding vocal. Specifically, as shown in FIG. 6, the speech 2 is inserted so as to end a certain time T2 before the reproduction start timing of the vocal 2. That is, the speech 2 ends a predetermined time T2 before the start of the reproduction of the vocal 2. In this case, the reproduction start timing of the speech 2 is determined according to the length of the speech 2. In Method 1, since a certain time is secured from the end of the speech reproduction until the corresponding vocal is reproduced, the user can sing the vocal portion with a margin.
 方法2では、スピーチの終了タイミングを楽曲の拍位置と一致させる。具体的に、図6の例では、スピーチ2は、ボーカル2の再生開始タイミングよりN拍前(Nは任意の整数;本例ではN=1)に終了するように挿入される。この場合、スピーチ2の再生開始タイミングはスピーチ2の長さに応じて決まる。なお、楽曲の拍の位置は、前述の楽曲解析データから取得される。 In Method 2, the speech end timing is matched with the beat position of the music. Specifically, in the example of FIG. 6, the speech 2 is inserted so as to end N beats before the playback start timing of the vocal 2 (N is an arbitrary integer; N = 1 in this example). In this case, the reproduction start timing of the speech 2 is determined according to the length of the speech 2. The position of the beat of the music is acquired from the music analysis data described above.
 方法3では、スピーチの再生開始タイミングと再生終了タイミングの両方を楽曲の拍位置と一致させる。具体的に、図6の例では、スピーチ2の再生開始タイミング及び再生終了タイミングをともに4拍子の3拍目に一致させている。 In Method 3, both the speech playback start timing and playback end timing are matched with the beat position of the music. Specifically, in the example of FIG. 6, the playback start timing and playback end timing of the speech 2 are both made coincident with the third beat of the four beats.
 方法2、3のように、スピーチの終了タイミング、又は、開始/終了タイミングの両方を楽曲の拍位置と一致させると、スピーチが楽曲と連動するのでユーザが楽曲を歌いやすくなる。 As in methods 2 and 3, if both the speech end timing or the start / end timing coincide with the beat position of the music, the speech is linked to the music, so that the user can easily sing the music.
 以上のようにして、端末装置は、スピーチの挿入タイミングを決定する。具体的には、各スピーチについて、その再生開始タイミングと再生終了タイミングとを、楽曲の先頭からの経過時間により規定する。各スピーチの再生開始タイミングと再生終了タイミングは、スピーチ情報の一部として記憶される。即ち、スピーチ情報は、各スピーチに対応する音声信号(以下、「スピーチ信号」とも呼ぶ。)と、各スピーチの再生開始タイミング/再生終了タイミングとを含む。 As described above, the terminal device determines the speech insertion timing. Specifically, for each speech, the playback start timing and playback end timing are defined by the elapsed time from the beginning of the music. The playback start timing and playback end timing of each speech is stored as part of the speech information. That is, the speech information includes an audio signal corresponding to each speech (hereinafter also referred to as “speech signal”) and the reproduction start timing / reproduction end timing of each speech.
 次に、処理は図2に示すメインルーチンに戻り、端末装置は、再生中の楽曲の現在の再生位置を取得する(ステップS4)。具体的には、端末装置は、再生中の楽曲の再生開始時刻からの経過時間をカウントすることにより、現在の再生位置を取得する。 Next, the processing returns to the main routine shown in FIG. 2, and the terminal device acquires the current playback position of the music being played back (step S4). Specifically, the terminal device acquires the current reproduction position by counting the elapsed time from the reproduction start time of the music being reproduced.
 次に、端末装置は、スピーチ強調処理を行う(ステップS5)。スピーチ強調処理は、楽曲に含まれるボーカルと、スピーチとを区別して聞き取り易くする処理であるが、その詳細は後述する。 Next, the terminal device performs speech enhancement processing (step S5). The speech emphasis process is a process for distinguishing vocals included in music from speech and making them easy to hear, details of which will be described later.
 次に、端末装置は、スピーチ情報に含まれる各スピーチの再生開始タイミング/再生終了タイミングと、現在の再生位置とに基づいて、スピーチを再生する(ステップS6)。具体的には、スピーチの再生開始タイミングでスピーチの再生を開始し、スピーチの再生終了タイミングでスピーチの再生を終了する。これにより、楽曲中のボーカルに先行して、対応するスピーチが再生されることになる。 Next, the terminal device reproduces the speech based on the reproduction start timing / reproduction end timing of each speech included in the speech information and the current reproduction position (step S6). Specifically, the speech reproduction is started at the speech reproduction start timing, and the speech reproduction is terminated at the speech reproduction end timing. As a result, the corresponding speech is reproduced prior to the vocal in the music.
 次に、端末装置は、スピーチの再生を終了すべきか否かを判定する(ステップS7)。スピーチの再生を終了すべき場合とは、スピーチ情報が無くなった場合、楽曲の再生自体が終了した場合、ユーザの操作によりアシストボーカルがオフされた場合、などが挙げられる。スピーチの再生を終了すべきでない場合(ステップS7:No)、処理はステップS4へ戻り、スピーチの再生を継続する。一方、スピーチの再生を終了すべきである場合(ステップS7:Yes)、アシストボーカル処理は終了する。 Next, the terminal device determines whether or not the speech reproduction should be terminated (step S7). Examples of the case where speech reproduction should be terminated include a case where speech information is lost, a case where music reproduction itself is terminated, a case where assist vocals are turned off by a user operation, and the like. If the speech reproduction should not be terminated (step S7: No), the process returns to step S4 to continue the speech reproduction. On the other hand, if the speech reproduction should be terminated (step S7: Yes), the assist vocal process is terminated.
 [1.3]アシストボーカルの自動オン設定方法
 次に、図2に示すアシストボーカル処理のステップS1においてアシストボーカルを自動的にオンに設定する方法について説明する。
[1.3] Automatic Assist Vocal Setting Method Next, a method for automatically turning on the assist vocal in step S1 of the assist vocal process shown in FIG. 2 will be described.
 基本的な方法としては、端末装置は、ユーザが発している音声をマイクで集音し、ユーザが楽曲に合わせて歌唱している(歌を歌っている)又は歌唱に準ずる行為を行っていると判定される場合にアシストボーカルを自動的にオンにする。例えば、マイクにより集音した音声データを解析した結果、鼻歌を歌っている、断片的に曲を歌っている、ハミングしているなどと判定される場合には、アシストボーカルをオンにする。一方、音声データが歌唱しているのではなく、同乗者との会話である場合にはアシストボーカルをオンにしない。音声データが鼻歌を歌っている部分を含んでいるような場合でも、大部分が会話であるような場合にもアシストボーカルをオンにはしない。 As a basic method, the terminal device collects the voice uttered by the user with a microphone, and the user is singing (singing a song) according to the music or performing an action equivalent to singing. Assist vocal is automatically turned on when it is determined. For example, as a result of analyzing voice data collected by a microphone, if it is determined that a nose is sung, a piece is being sung, a humming, or the like, the assist vocal is turned on. On the other hand, when the voice data is not singing but is a conversation with a passenger, the assist vocal is not turned on. The assist vocal is not turned on even when the voice data includes a part singing a nose song, or when the voice data is mostly conversational.
 なお、音声データに含まれるユーザの音声が歌唱であるか否かは、音声データに含まれるリズムや音程の有無に基づいて判断することができる。例えばリズムが規則的である場合や音程の変化が大きい場合には歌唱であると判断し、リズムが不規則である場合は音程の変化が小さい場合に歌唱ではない(会話である)と判断することができる。また、前述の楽曲解析アプリケーションを利用し、音声データから拍や小節が抽出できた場合に歌唱であると判断し、抽出できない場合に歌唱ではないと判断してもよい。また、前述の音楽検索サーバ又は音楽検索機能を利用し、音声データから楽曲が特定できた場合に歌唱であると判断し、楽曲が特定できない場合に歌唱ではないと判断してもよい。 It should be noted that whether or not the user's voice included in the voice data is a song can be determined based on the presence or absence of a rhythm or pitch included in the voice data. For example, if the rhythm is regular or the change in pitch is large, it is judged as singing. If the rhythm is irregular, it is judged as not singing (conversation) if the change in pitch is small. be able to. Further, by using the music analysis application described above, it may be determined that the song is a song when a beat or measure can be extracted from the audio data, and may not be a song when the song cannot be extracted. Further, by using the music search server or the music search function described above, it may be determined that the song is a song when the song can be specified from the audio data, and may not be determined when the song cannot be specified.
 また、端末装置は、集音した音声データと、再生中の楽曲との相関を算出し、一定値以上の相関がある場合に、ユーザが歌唱していると判断してアシストボーカルをオンにしてもよい。また、端末装置が再生中の曲の歌詞データを既に取得している場合には、マイクにより集音した音声データと歌詞データとの相関が一定値以上である場合に、ユーザが歌っていると判断してもよい。また、歌詞データに基づいて、歌詞が存在しないはずの楽曲の間奏位置においてもユーザの音声が出力されている場合には、それは会話であると判断してもよい。 Also, the terminal device calculates the correlation between the collected audio data and the music being played, and when there is a correlation greater than a certain value, determines that the user is singing and turns on the assist vocal Also good. In addition, when the terminal device has already acquired the lyrics data of the song being played back, the user is singing when the correlation between the voice data collected by the microphone and the lyrics data is a certain value or more You may judge. Further, based on the lyric data, when the user's voice is output even at the interlude position of the music where the lyric should not exist, it may be determined that it is a conversation.
 また、マイクで集音したリズムの情報を利用してもよい。例えば、ユーザが楽曲のリズムに合わせて手や指でステアリングなどを叩いているとか、足で床を踏んでリズムを取っていると判断される場合には、ユーザが歌唱に準ずる行為を行っていると判定し、アシストボーカルをオンにしてもよい。この場合、マイクで集音したリズムと再生中の楽曲のリズムとの相関を算出し、相関が一定値以上である場合にアシストボーカルをオンにしてもよい。また、再生中の楽曲のリズムとの相関を算出しなくても、マイクで集音されたリズムが、一定のリズムの繰り返しになっているような場合には、アシストボーカルをオンにしてもよい。 Also, the rhythm information collected by the microphone may be used. For example, if it is determined that the user is hitting the steering wheel with his / her hand or finger in accordance with the rhythm of the music, or if he / she is stepping on the floor with his / her foot, he / she performs an act similar to singing. The assist vocal may be turned on. In this case, the correlation between the rhythm collected by the microphone and the rhythm of the music being played back may be calculated, and the assist vocal may be turned on when the correlation is a certain value or more. In addition, the assist vocal may be turned on when the rhythm collected by the microphone repeats a certain rhythm without calculating the correlation with the rhythm of the music being played. .
 さらには、車内を撮影するカメラでユーザの状態を撮影し、ユーザが楽曲に合わせて首を振っているような場合に、アシストボーカルをオンにしてもよい。また、車内を撮影するカメラにより、助手席や後部座席に同乗者がいるか否かを検出し、同乗者の有無により、ユーザが歌っているのか会話しているのかの判定基準を変化させてもよい。 Furthermore, the assist vocal may be turned on when the state of the user is photographed with a camera that photographs the inside of the vehicle and the user is shaking his / her head along with the music. In addition, it is possible to detect whether there is a passenger in the passenger seat or the rear seat with a camera that captures the interior of the vehicle, and even if the determination criterion of whether the user is singing or talking is changed depending on the presence of the passenger Good.
 また、上記の例では、ユーザが歌唱していると判断した場合に、アシストボーカルをオンにする例を説明したが、ユーザが歌唱していても、ユーザが歌詞を知っていてアシストボーカルを再生する必要がないと判断した場合には、アシストボーカルをオンにしなくてもよい。具体的には、例えば集音した音声データと、再生中の楽曲との相関が一定値以上であり、かつ歌詞データとの相関が一定値以上である場合には、ユーザが歌詞を知っていると判断し、歌唱していてもアシストボーカルをオンにしない。 Also, in the above example, an example is described in which the assist vocal is turned on when it is determined that the user is singing. However, even if the user is singing, the user knows the lyrics and plays the assist vocal. If it is determined that it is not necessary to turn on the assist vocal, it is not necessary to turn on the assist vocal. Specifically, for example, when the correlation between the collected sound data and the music being played is a certain value or more and the correlation with the lyrics data is a certain value or more, the user knows the lyrics. Assist vocals are not turned on even when singing.
 ただしこの場合、ユーザが途中から歌詞が分からなくなる可能性があるため、スピーチ情報を生成し、出力する準備をしておいてもよい。そのあとに、集音した音声データと、再生中の楽曲との相関が一定値未満であり、または歌詞データとの相関が一定値未満である場合には、ユーザは歌詞を知らないと判断し、アシストボーカルを出力する。 However, in this case, since the user may not understand the lyrics from the middle, the speech information may be generated and prepared for output. After that, if the correlation between the collected audio data and the music being played is less than a certain value, or if the correlation with the lyrics data is less than a certain value, it is determined that the user does not know the lyrics. , Output assist vocals.
 また、上記の例では、アシストボーカルの自動オン設定の方法について説明したが、アシストボーカルの自動オフ設定も行うことができる。アシストボーカルをオンしている間に、ユーザが楽曲に合わせて歌唱していない(歌を歌っていない)又は歌唱に準ずる行為(鼻歌を歌っている、断片的に曲を歌っている、ハミングをしている等)を行っていないと判定された場合に、アシストボーカルを自動的にオフにしてもよい。同様に、会話が検出されたら、アシストボーカルを自動的にオフにしてもよいし、リズムをとっていないと判断されたり、ユーザが楽曲に合わせて頭を振っていないと判断された場合、アシストボーカルを自動的にオフにしてもよい。 In the above example, the assist vocal auto-on setting method has been described, but the assist vocal auto-off setting can also be performed. While the assist vocal is turned on, the user does not sing along with the song (does not sing a song) or acts similar to the song (singing a nose song, singing a piece of music, humming Assist Vocal may be automatically turned off when it is determined that the user has not performed the operation). Similarly, if a conversation is detected, the assist vocal may be automatically turned off, and if it is determined that the rhythm is not taken or the user is not shaking his / her head to the music, Vocals may be turned off automatically.
 また、上記の例では、ユーザが歌唱しているもしくは歌唱に準ずる行為をしているか否かに基づき、アシストボーカルの自動オン設定もしくは自動オフ設定を行うことを説明したが、再生されている楽曲の構成に基づき自動オン設定もしくは自動オフ設定してもよい。 Moreover, in the above example, it is described that the assist vocal is automatically turned on or off based on whether or not the user is singing or acting in accordance with the singing. Depending on the configuration, automatic on setting or automatic off setting may be performed.
 例えば、楽曲のサビの部分だけ歌唱したいというユーザに対しては、楽曲のサビの部分を再生する際に、アシストボーカルを自動的にオン設定し、楽曲のサビ以外の部分を再生する際に、アシストボーカルを自動的にオフ設定してもよい。逆に、サビの部分は知っていてサビ以外の部分を練習したいというユーザに対しては、楽曲のサビ以外の部分を再生する際に、アシストボーカルを自動的にオン設定し、楽曲のサビの部分を再生する際に、アシストボーカルを自動的にオフ設定してもよい。 For example, for a user who wants to sing only the chorus part of the song, when playing the chorus part of the song, the assist vocal is automatically turned on, and when playing the part other than the chorus part of the song, The assist vocal may be automatically turned off. Conversely, for users who know the rust part and want to practice the part other than rust, when playing the part other than the rust part of the song, the assist vocal is automatically turned on, When playing the part, the assist vocal may be automatically turned off.
 [1.4]スピーチ強調処理
 次に、図2に示すアシストボーカル処理のステップS5において実行されるスピーチ強調処理について説明する。スピーチ強調処理は、ユーザがスピーチとボーカルとを区別して聞き取り易くする方法であり、以下のいくつかの方法を示す。
[1.4] Speech Enhancement Process Next, the speech enhancement process executed in step S5 of the assist vocal process shown in FIG. 2 will be described. The speech emphasis process is a method in which the user distinguishes between speech and vocal and makes it easy to hear, and shows the following several methods.
 [1.4.1]スピーチとボーカルが重なる場合の処理
 スピーチは基本的に対応するボーカルの直前の間奏中に再生され、ボーカルとは時間的に重ならないことが好ましい。このために前述のスピーチ長変更処理(ステップS35)を行うのであるが、スピーチの長さと間奏の長さによっては、スピーチ長を短縮してもスピーチを間奏中に再生しきれないこともある。即ち、間奏の長さよりも、スピーチの長さの方が長い場合、スピーチとボーカルとが部分的に重なって再生される。このようにスピーチとボーカルとを重ねて再生することに代えて、以下のいずれかの処理を行ってもよい。
[1.4.1] Processing when speech and vocal overlap The speech is basically reproduced during the interlude immediately before the corresponding vocal, and preferably does not overlap with the vocal in time. For this purpose, the above-described speech length changing process (step S35) is performed. Depending on the length of the speech and the length of the interlude, the speech may not be completely reproduced during the interlude even if the speech length is shortened. That is, when the length of the speech is longer than the length of the interlude, the speech and vocal are partially overlapped and reproduced. As described above, any of the following processing may be performed instead of reproducing speech and vocals in an overlapping manner.
 (1)ボーカルのレベルを調整する。 (1) Adjust the vocal level.
 スピーチとボーカルとが重なってしまう場合、ボーカルの音量レベルを下げる方法がある。図7(A)は、スピーチの後方部分と、ボーカルの先頭部分とが重なり、重複部分Xが生じる場合を示す。この場合、重複部分Xにおいてボーカルの音量を調整する。具体的には、ボーカルの音量をスピーチが聞こえる程度まで低下させる、もしくはゼロにする。これにより、重複部分Xでは、スピーチの再生が優先され、スピーチが聞き取り易くなる。 ¡If speech and vocals overlap, there is a way to lower the vocal volume level. FIG. 7A shows a case where the rear portion of the speech and the head portion of the vocal overlap and an overlapping portion X occurs. In this case, the volume of the vocal is adjusted in the overlapping portion X. Specifically, the vocal volume is reduced to a level where speech can be heard, or zero. Thereby, in the overlapping part X, the reproduction of the speech is prioritized and the speech is easy to hear.
 図7(B)は、逆にスピーチの先頭部分と、1つ前のボーカルの後方部分とが重なり、重複部分Xが生じる場合を示す。この場合にも、重複部分Xにおいて、ボーカルの音量を調整する。具体的には、ボーカルの音量をスピーチが聞こえる程度まで低下させる、もしくはゼロにする。また、重複部分Xにおいて、急にボーカルの音量レベルを下げるのではなく、ボーカルをフェードアウトさせて徐々に音量レベルを下げるようにしてもよい。これにより、重複部分Xでは、スピーチの再生が優先され、スピーチが聞き取り易くなる。 FIG. 7 (B) shows a case where the speech head portion and the rear portion of the previous vocal overlap, resulting in an overlap portion X. Also in this case, the vocal volume is adjusted in the overlapping portion X. Specifically, the vocal volume is reduced to a level where speech can be heard, or zero. Further, in the overlapping portion X, the volume level of the vocal may not be suddenly lowered, but the volume level may be gradually lowered by fading out the vocal. Thereby, in the overlapping part X, the reproduction of the speech is prioritized and the speech is easy to hear.
 具体的に上記のレベル調整は、楽曲信号においてボーカルの成分と楽器などの演奏の成分とが分離している場合には、ボーカルの成分の音量レベルを低下させればよい。一方、ボーカルの部分が楽器などの演奏の部分と合成されており、ボーカルのみの音量を調整できない場合には、楽曲信号全体の音量レベルを低下させてもよいし、又は、楽曲信号のうち一般的にボーカル(人間の声)に相当する周波数帯域の成分のみ音量レベルを低下させるようにしてもよい。 Specifically, the above level adjustment may be performed by lowering the volume level of the vocal component when the vocal component and the performance component such as a musical instrument are separated in the music signal. On the other hand, if the vocal part is synthesized with a performance part such as a musical instrument and the volume of the vocal alone cannot be adjusted, the volume level of the entire music signal may be reduced, In particular, the volume level may be lowered only for the component in the frequency band corresponding to vocal (human voice).
 (2)スピーチのレベルを調整する。 (2) Adjust the speech level.
 スピーチとボーカルとが重なってしまう場合、逆にスピーチの音量レベルを下げる方法もある。図7(C)は、スピーチの後方部分と、ボーカルの先頭部分とが重なり、重複部分Xが生じる場合を示す。この場合、重複部分Xにおいて、スピーチの音量を調整する。具体的には、スピーチの音量を低下させる、もしくはゼロにする。急にスピーチの音量を下げるのではなく、スピーチをフェードアウトさせて徐々に音量を下げるようにしてもよい。この場合、重複部分Xでは、スピーチが聞き取れなくなるが、一般的にユーザがある程度知っている楽曲を聞く場合には、歌詞の全てを覚えてはいないものの、歌詞の先頭部分がわかれば、その後は歌詞を思い出して歌うことができるということも多い。よって、図7(C)のように、スピーチの先頭部分が聞き取れれば、スピーチの後方部分が聞き取りにくくなっても構わないということも多い。この手法はそのような場合に有効である。 ¡If speech and vocals overlap, there is a method to lower the speech volume level. FIG. 7C shows a case where the rear part of the speech and the head part of the vocal overlap and an overlapping part X occurs. In this case, the speech volume is adjusted in the overlapping portion X. Specifically, the volume of speech is reduced or zero. Instead of suddenly reducing the speech volume, the speech may be faded out to gradually decrease the volume. In this case, speech cannot be heard at the overlapping portion X, but generally when listening to a song that the user knows to some extent, the entire lyrics are not remembered, but if the beginning of the lyrics is known, Often, you can sing with the memory of the lyrics. Therefore, as shown in FIG. 7C, if the head portion of the speech can be heard, the rear portion of the speech may be difficult to hear. This technique is effective in such a case.
 [1.4.2]スピーチとボーカルを異なる方向から聞かせる処理
 人間には、同時に異なる方向から到来する音を聞き分ける能力がある(いわゆるカクテルパーティ効果)。これを利用し、ユーザがスピーチとボーカルとを聞き分けることができるようにする手法が考えられる。なお、この手法は、スピーチとボーカルとが時間的に重なるか否かに拘わらず実行される。
[1.4.2] Processing to hear speech and vocals from different directions Humans have the ability to distinguish sounds coming from different directions at the same time (so-called cocktail party effect). Using this, it is possible to consider a method that enables the user to distinguish between speech and vocal. This method is executed regardless of whether speech and vocal overlap in time.
 (1)左右のスピーカで位相を調整する方法
 図8(A)は、左右のスピーカから出力されるスピーチの位相を反転させる構成を示す。左(L)チャンネルの楽曲信号は加算器32に供給され、右(R)チャンネルの楽曲信号は加算器33に供給される。一方、スピーチ信号は、そのまま加算器33に供給されるとともに、位相反転器31で位相が反転されて加算器32に供給される。加算器32の出力は左スピーカ30Lに供給され、加算器33の出力は右スピーカ30Rに供給される。
(1) Method of adjusting phase with left and right speakers FIG. 8A shows a configuration in which the phase of speech output from left and right speakers is inverted. The music signal of the left (L) channel is supplied to the adder 32, and the music signal of the right (R) channel is supplied to the adder 33. On the other hand, the speech signal is supplied to the adder 33 as it is, and its phase is inverted by the phase inverter 31 and supplied to the adder 32. The output of the adder 32 is supplied to the left speaker 30L, and the output of the adder 33 is supplied to the right speaker 30R.
 この構成によれば、ボーカルを含む楽曲の音像は左右スピーカの間に定位するのに対し、スピーチの音像はユーザの耳回りに定位することになり、ユーザはスピーチと楽曲中のボーカルとを聞き分けやすくなる。なお、図8(A)の例では、位相反転器31により左スピーカ30Lに供給されるスピーチ信号の位相のみを反転しているが、逆に右スピーカ30Rに供給されるスピーチ信号の位相のみを反転させてもよい。また、左右のスピーカに供給されるスピーチ信号の間に一定の位相差があればスピーチの音像位置と楽曲の音像位置とを異ならせることができるので、一方のスピーカに供給されるスピーチ信号を必ずしも反転(180°変化)させる必要はない。即ち、一方のスピーカに供給されるスピーチ信号と、他方のスピーカに供給されるスピーチ信号との間に一定の位相差を与えてやればよい。 According to this configuration, the sound image of a song including vocals is localized between the left and right speakers, whereas the sound image of a speech is localized around the user's ears, and the user distinguishes between the speech and the vocals in the song. It becomes easy. In the example of FIG. 8A, only the phase of the speech signal supplied to the left speaker 30L is inverted by the phase inverter 31, but only the phase of the speech signal supplied to the right speaker 30R is inverted. It may be reversed. Also, if there is a certain phase difference between the speech signals supplied to the left and right speakers, the sound image position of the speech and the sound image position of the music can be made different, so the speech signal supplied to one speaker is not necessarily There is no need to reverse (change 180 °). That is, it is only necessary to give a certain phase difference between the speech signal supplied to one speaker and the speech signal supplied to the other speaker.
 なお、上記の構成において、位相反転器31は本発明の信号処理手段の一例であり、加算器32、33は本発明の加算手段及び出力手段の一例である。 In the above configuration, the phase inverter 31 is an example of the signal processing means of the present invention, and the adders 32 and 33 are examples of the adding means and the output means of the present invention.
 (2)音像の定位を制御する方法
 図8(B)は、スピーチの音像を任意の位置に設定可能な構成を示す。左(L)チャンネルの楽曲信号は加算器32に供給され、右(R)チャンネルの楽曲信号は、加算器33に供給される。一方、スピーチ信号は、音像定位制御演算部34、クロストークキャンセル部35を経由して加算器32、33に供給される。音像定位制御演算部34は、目標のスピーカ位置と聴取位置(ユーザの位置)との間の伝達関数をスピーチ信号に畳み込み、クロストークキャンセル部35は楽曲を出力しているスピーカと聴取位置との間の伝達関数をキャンセルする処理を行う。これにより、楽曲の音像は左右のスピーカ30L、30Rの間に定位させるとともに、スピーチの音像を目標のスピーカ位置に定位させることができるので、ユーザはスピーチとボーカルとを聞き分けやすくなる。
(2) Method for Controlling Sound Image Localization FIG. 8B shows a configuration in which a sound image of speech can be set at an arbitrary position. The music signal of the left (L) channel is supplied to the adder 32, and the music signal of the right (R) channel is supplied to the adder 33. On the other hand, the speech signal is supplied to the adders 32 and 33 via the sound image localization control calculation unit 34 and the crosstalk cancellation unit 35. The sound image localization control calculation unit 34 convolves the transfer function between the target speaker position and the listening position (user's position) with the speech signal, and the crosstalk canceling unit 35 sets the speaker outputting the music and the listening position. A process for canceling the transfer function between them is performed. Accordingly, the sound image of the music can be localized between the left and right speakers 30L and 30R, and the sound image of the speech can be localized at the target speaker position, so that the user can easily distinguish between the speech and the vocal.
 なお、上記の構成において、音像定位制御演算部34及びクロストークキャンセル部35は本発明の信号処理手段の一例であり、加算器32、33は本発明の加算手段及び出力手段の一例である。 In the above configuration, the sound image localization control calculation unit 34 and the crosstalk cancellation unit 35 are examples of signal processing means of the present invention, and the adders 32 and 33 are examples of addition means and output means of the present invention.
 (3)ヘッドレストスピーカを利用する方法
 車両のスピーカに加えて車両のシートにヘッドレストスピーカが搭載されている場合、車両のスピーカからボーカルを含む楽曲を出力し、ヘッドレストスピーカからスピーチを出力することができる。この場合の構成例を図9に示す。
(3) Method of using a headrest speaker When a headrest speaker is mounted on a vehicle seat in addition to a vehicle speaker, music including vocals can be output from the vehicle speaker, and speech can be output from the headrest speaker. . A configuration example in this case is shown in FIG.
 左右チャンネルの楽曲信号はそれぞれ車両のスピーカ30L、30Rに供給される。また、スピーチ信号は、そのまま右のヘッドレストスピーカ35Rに供給されるとともに、位相反転器31で位相が反転されて左のヘッドレストスピーカ35Lに供給される。この場合も、2つのヘッドレストスピーカ35L、35Rに供給されるスピーチ信号に位相差が与えられているため、スピーチの音像は楽曲の音像と異なる位置に定位し、ユーザはスピーチと楽曲中のボーカルとを聞き分けやすくなる。なお、この例においても、図8(A)の例と同様に、一方のヘッドレストスピーカに供給されるスピーチ信号と、他方のヘッドレストスピーカに供給されるスピーチ信号との間に一定の位相差を与えてやればよい。 The music signals of the left and right channels are supplied to the vehicle speakers 30L and 30R, respectively. The speech signal is supplied to the right headrest speaker 35R as it is, and the phase is inverted by the phase inverter 31 and is supplied to the left headrest speaker 35L. In this case as well, since the phase difference is given to the speech signals supplied to the two headrest speakers 35L and 35R, the sound image of the speech is localized at a position different from the sound image of the music, and the user can recognize the speech and the vocals in the music. It becomes easy to distinguish. In this example as well, as in the example of FIG. 8A, a constant phase difference is given between the speech signal supplied to one headrest speaker and the speech signal supplied to the other headrest speaker. Do it.
 ヘッドレストスピーカを利用する場合には、運転席のヘッドレストスピーカの代わりに、助手席のヘッドレストスピーカを利用してスピーチを再生してもよい。また、車両の複数の座席にヘッドレストスピーカが搭載されている場合には、各座席毎にスピーチの再生の要否を選択して設定できるようにしてもよい。こうすると、スピーチを聞いて楽曲を歌いたい搭乗者の座席のヘッドレストスピーカのみからスピーチが再生されるように設定することができる。 When using the headrest speaker, the speech may be reproduced using the headrest speaker in the passenger seat instead of the headrest speaker in the driver seat. Further, when headrest speakers are mounted on a plurality of seats of the vehicle, it may be possible to select and set the necessity of speech reproduction for each seat. In this way, it is possible to set so that the speech is reproduced only from the headrest speaker in the seat of the passenger who wants to sing the music while listening to the speech.
 また、位相差を与えることに代えて、図8(B)で説明した処理と同様に、音像定位制御演算部34と、クロストークキャンセル部35とを用いることで、スピーチの音像を任意の位置に定位させてもよい。これにより、ユーザがスピーチとボーカルとを聞き分けやすくすることができる。 Further, instead of providing the phase difference, the sound image of the speech can be placed at an arbitrary position by using the sound image localization control calculation unit 34 and the crosstalk cancellation unit 35 in the same manner as the processing described in FIG. 8B. May be localized. This makes it easy for the user to distinguish between speech and vocals.
 [2]システム構成
 次に、上述のアシストボーカルを実現する楽曲再生システムの構成例を説明する。
[2] System Configuration Next, a configuration example of a music playback system that realizes the above-described assist vocal will be described.
 [2.1]第1実施例
 第1実施例では、アシストボーカル処理を主として端末装置側で実行する。第1実施例による楽曲再生システムの全体構成を図10に示す。第1実施例の楽曲再生システムでは、複数の車両1と、コンテンツプロバイダ2と、ゲートサーバ3とがネットワーク4を介して通信可能とされる。なお、複数の車両1は、無線通信によりネットワーク4を介してコンテンツサーバ2、ゲートサーバ3と通信可能となっている。
[2.1] First Example In the first example, assist vocal processing is executed mainly on the terminal device side. FIG. 10 shows the overall configuration of the music playback system according to the first embodiment. In the music reproduction system of the first embodiment, a plurality of vehicles 1, a content provider 2, and a gate server 3 can communicate with each other via a network 4. The plurality of vehicles 1 can communicate with the content server 2 and the gate server 3 via the network 4 by wireless communication.
 コンテンツプロバイダ2は、音楽配信業者などのサーバであり、楽曲データ、楽曲のメタデータ、歌詞データなどを提供する。ゲートサーバ3は、本実施例によるアシストボーカルを実現するために機能するサーバであり、コンテンツプロバイダ2から必要な楽曲の楽曲データ、メタデータ、歌詞データなどを取得して、図示しないデータベースに記憶している。 Content provider 2 is a server such as a music distributor and provides music data, music metadata, lyrics data, and the like. The gate server 3 is a server that functions to realize the assist vocal according to the present embodiment, acquires music data, metadata, lyrics data, and the like of necessary music from the content provider 2 and stores them in a database (not shown). ing.
 車両1の内部構成の一例を図11(A)に示す。車両1は、端末装置10と、音楽再生装置20と、スピーカ30とを備える。 An example of the internal configuration of the vehicle 1 is shown in FIG. The vehicle 1 includes a terminal device 10, a music playback device 20, and a speaker 30.
 端末装置10は、典型的にはスマートフォンなどの携帯端末であり、通信部11と、制御部12と、記憶部13と、マイク14と、操作部15とを備える。通信部11は、ネットワーク4を通じてゲートサーバ3と通信する。制御部12は、CPUなどからなり、端末装置10の全体を制御する。 The terminal device 10 is typically a mobile terminal such as a smartphone, and includes a communication unit 11, a control unit 12, a storage unit 13, a microphone 14, and an operation unit 15. The communication unit 11 communicates with the gate server 3 through the network 4. The control unit 12 includes a CPU and the like, and controls the entire terminal device 10.
 記憶部13は、ROM、RAMなどのメモリであり、制御部12が各種の処理を実行するためのプログラムを記憶するとともに、ワークメモリとしても機能する。記憶部13に記憶されたプログラムを制御部12が実行することにより、アシストボーカル処理を含む処理が実行される。また、記憶部13は、ユーザが保存した楽曲の楽曲データを記憶していてもよい。 The storage unit 13 is a memory such as a ROM or a RAM, and stores a program for the control unit 12 to execute various processes, and also functions as a work memory. When the control unit 12 executes the program stored in the storage unit 13, processing including assist vocal processing is executed. Moreover, the memory | storage part 13 may memorize | store the music data of the music preserve | saved by the user.
 マイク14は、車内で再生されている楽曲、ユーザによる歌唱、会話などの音声を集音して音声データを生成する。操作部15は、典型的にはタッチパネルなどであり、ユーザによる操作、選択の入力を受け付ける。 The microphone 14 collects sounds such as music being played in the car, singing by the user, conversation, etc., and generates sound data. The operation unit 15 is typically a touch panel or the like, and receives an operation and selection input by a user.
 音楽再生装置20は、例えばカーオーディオなどであり、アンプなどを含む。スピーカ30は、車両に搭載されたスピーカである。音楽再生装置20は、端末装置10から供給される楽曲データに基づいて楽曲をスピーカ30から再生する。 The music playback device 20 is a car audio, for example, and includes an amplifier. The speaker 30 is a speaker mounted on the vehicle. The music playback device 20 plays back music from the speaker 30 based on the music data supplied from the terminal device 10.
 車両1の内部構成の他の例を図11(B)に示す。この例では、車両1は端末装置10xを備える。端末装置10xは、図11(A)に示す携帯端末などの端末装置10とカーオーディオなどの音楽再生装置20の機能を併せ持つ装置である。端末装置10xは、端末装置10と同様に通信部11、制御部12、記憶部13、マイク14、操作部15を備えるとともに、音楽再生装置20に相当する音楽再生部16を備える。端末装置10xはスピーカ30に接続され、楽曲データに基づいて楽曲をスピーカ30から再生する。 Another example of the internal configuration of the vehicle 1 is shown in FIG. In this example, the vehicle 1 includes a terminal device 10x. The terminal device 10x is a device having the functions of the terminal device 10 such as a portable terminal shown in FIG. 11A and the music playback device 20 such as car audio. Similarly to the terminal device 10, the terminal device 10 x includes a communication unit 11, a control unit 12, a storage unit 13, a microphone 14, and an operation unit 15, and a music playback unit 16 that corresponds to the music playback device 20. The terminal device 10x is connected to the speaker 30 and reproduces music from the speaker 30 based on music data.
 次に、第1実施例の楽曲再生システムによるアシストボーカル処理について説明する。図12は、第1実施例に係るアシストボーカル処理のフローチャートである。この処理では、アシストボーカル処理を主として端末装置10又は10x(以下、代表して単に「端末装置10」と記す。)により実行する。 Next, assist vocal processing by the music reproducing system of the first embodiment will be described. FIG. 12 is a flowchart of the assist vocal process according to the first embodiment. In this process, the assist vocal process is executed mainly by the terminal device 10 or 10x (hereinafter simply referred to as “terminal device 10”).
 まず、ゲートサーバ3は、ネットワーク4を介してコンテンツプロバイダ2に接続し、複数の楽曲について、楽曲データ及び歌詞データを取得し、内部のデータベースに保存しておく(ステップS101)。 First, the gate server 3 is connected to the content provider 2 via the network 4, acquires music data and lyrics data for a plurality of music, and stores them in an internal database (step S 101).
 端末装置10は、ユーザによる操作部15の操作により、再生すべき楽曲の指定を受け取り(ステップS102)、その楽曲を指定する楽曲指定情報をゲートサーバ3へ送信する(ステップS103)。ゲートサーバ3は、受け取った楽曲指定情報に対応する楽曲の楽曲データ及び歌詞データをデータベースから取得し、端末装置10へ送信する(ステップS104)。 The terminal device 10 receives designation of the music to be played by the operation of the operation unit 15 by the user (step S102), and transmits music designation information for designating the music to the gate server 3 (step S103). The gate server 3 acquires the song data and lyrics data of the song corresponding to the received song designation information from the database, and transmits it to the terminal device 10 (step S104).
 次に、端末装置10は、受信した楽曲データ及び歌詞データを利用して、ステップS105~S109の処理を行う。ここで、ステップS105~S109の処理は、図2におけるステップS3~S7と同様であるので、説明を省略する。 Next, the terminal device 10 performs the processing of steps S105 to S109 using the received music data and lyrics data. Here, the processing in steps S105 to S109 is the same as that in steps S3 to S7 in FIG.
 こうして、第1実施例の楽曲再生システムにおいては、車両1に搭載された端末装置10が主としてアシストボーカル処理を実行する。 Thus, in the music reproducing system of the first embodiment, the terminal device 10 mounted on the vehicle 1 mainly executes the assist vocal process.
 上記の例では、ステップS101でゲートサーバ3はコンテンツプロバイダから楽曲データを取得しているが、楽曲データが端末装置10に保存されている場合には、ゲートサーバ3は端末装置10から楽曲データを取得してもよい。また、ゲートサーバ3内のデータベースに楽曲データが保存されている場合には、そこから楽曲データを取得してもよい。 In the above example, the gate server 3 acquires the music data from the content provider in step S101. However, when the music data is stored in the terminal device 10, the gate server 3 receives the music data from the terminal device 10. You may get it. Further, when music data is stored in the database in the gate server 3, the music data may be acquired therefrom.
 [2.2]第2実施例
 第2実施例では、アシストボーカル処理の一部をゲートサーバ3側で実行する。第2実施例による楽曲再生システムの全体構成は、図10に示す第1実施例と同様であるので、説明を省略する。
[2.2] Second Embodiment In the second embodiment, a part of the assist vocal process is executed on the gate server 3 side. The overall configuration of the music playback system according to the second embodiment is the same as that of the first embodiment shown in FIG.
 次に、第2実施例の楽曲再生システムによるアシストボーカル処理について説明する。図13は、第2実施例に係るアシストボーカル処理のフローチャートである。この処理では、ゲートサーバ3がスピーチ情報を生成し、さらにスピーチ付楽曲データを生成して端末装置10へ送信する。端末装置10は、スピーチ付楽曲データを受信して再生する。以下、詳しく説明する。 Next, assist vocal processing by the music reproducing system of the second embodiment will be described. FIG. 13 is a flowchart of the assist vocal process according to the second embodiment. In this process, the gate server 3 generates speech information, further generates music data with speech, and transmits it to the terminal device 10. The terminal device 10 receives and reproduces the music data with speech. This will be described in detail below.
 まず、ゲートサーバ3は、ネットワーク4を介してコンテンツプロバイダ2に接続し、複数の楽曲について、楽曲データ及び歌詞データを取得し、内部のデータベースに保存する(ステップS201)。そして、ゲートサーバ3は、各楽曲について、取得した楽曲データと歌詞データとに基づいてスピーチ情報を生成する(ステップS202)。このスピーチ情報生成処理は、図2のステップS3と同一であるので、説明を省略する。 First, the gate server 3 is connected to the content provider 2 via the network 4, acquires music data and lyrics data for a plurality of music, and stores them in an internal database (step S 201). And the gate server 3 produces | generates speech information about each music based on the acquired music data and lyrics data (step S202). This speech information generation process is the same as step S3 in FIG.
 スピーチ情報を生成すると、ゲートサーバ3は、楽曲データにスピーチを付加してスピーチ付楽曲データを生成する(ステップS203)。具体的に、ゲートサーバ3は、生成したスピーチ情報に基づいて、各スピーチに対応するスピーチ信号を、図3のステップS36の処理により算出したタイミングで楽曲データに合成し、スピーチ付楽曲データを生成してデータベースに記憶する。つまり、スピーチ付楽曲データは、そのまま再生することにより、楽曲に加えてスピーチが再生されるデータである。 When the speech information is generated, the gate server 3 adds the speech to the music data and generates the music data with speech (step S203). Specifically, the gate server 3 combines the speech signal corresponding to each speech with the music data at the timing calculated by the process of step S36 in FIG. 3 based on the generated speech information, and generates music data with speech. And store it in the database. In other words, the music data with speech is data in which speech is reproduced in addition to the music by reproducing as it is.
 端末装置10は、ユーザによる操作部15の操作により、再生すべき楽曲の指定を受け取り(ステップS204)、その楽曲を指定する楽曲指定情報をゲートサーバ3へ送信する(ステップS205)。ゲートサーバ3は、受け取った楽曲指定情報に対応する楽曲のスピーチ付楽曲データを端末装置10へ送信する(ステップS206)。 The terminal device 10 receives designation of the music to be played by the operation of the operation unit 15 by the user (step S204), and transmits music designation information for designating the music to the gate server 3 (step S205). The gate server 3 transmits the song-attached music data corresponding to the received music designation information to the terminal device 10 (step S206).
 次に、端末装置10は、受信したスピーチ付楽曲データを再生する(ステップS207)。これにより、楽曲の再生中の適切なタイミングで、スピーチが再生される。次に、端末装置10は、楽曲の再生を終了すべきか否かを判定する(ステップS208)。その楽曲が最後まで再生された場合、又は、ユーザが再生を中止した場合など、再生を終了すべき場合には(ステップS208:Yes)、端末装置10は再生を終了する。一方、楽曲の再生を終了すべきではない場合(ステップS208:No)、処理はステップS207へ戻り、スピーチ付楽曲データの再生が継続される。 Next, the terminal device 10 reproduces the received music data with speech (step S207). Thereby, the speech is reproduced at an appropriate timing during the reproduction of the music. Next, the terminal device 10 determines whether or not the music reproduction should be terminated (step S208). When the music has been played to the end, or when playback should be terminated, such as when the user has stopped playing (step S208: Yes), the terminal device 10 finishes playing. On the other hand, if the reproduction of the music should not be terminated (step S208: No), the process returns to step S207, and the reproduction of the music data with speech is continued.
 こうして、第2実施例の楽曲再生システムにおいては、ゲートサーバ3側でスピーチ付楽曲データが生成され、端末装置10へ提供される。端末装置10は、受信したスピーチ付楽曲データを再生することにより、スピーチを含む楽曲を聞くことができる。 Thus, in the music reproduction system of the second embodiment, the music data with speech is generated on the gate server 3 side and provided to the terminal device 10. The terminal device 10 can listen to music including speech by reproducing the received music data with speech.
 上記の例では、ステップS201でゲートサーバ3はコンテンツプロバイダから楽曲データを取得しているが、楽曲データが端末装置10に保存されている場合には、ゲートサーバ3は端末装置10から楽曲データを取得してもよい。また、ゲートサーバ3内のデータベースに楽曲データが保存されている場合には、そこから楽曲データを取得してもよい。 In the above example, the gate server 3 acquires the music data from the content provider in step S201. However, if the music data is stored in the terminal device 10, the gate server 3 receives the music data from the terminal device 10. You may get it. Further, when music data is stored in the database in the gate server 3, the music data may be acquired therefrom.
 [3]スピーチのみを再生するアシストボーカル
 上述のアシストボーカル処理では、端末装置10により再生している楽曲に対して、スピーチを付加して再生している。しかし、端末装置10以外のソース、例えば車内のラジオ、CDなど(以下、「外部ソース」と呼ぶ。)から再生されている楽曲に対してスピーチを付加することができれば便利である。この場合、端末装置10は、基本的に上述の方法でスピーチ情報を生成し、外部ソースから再生されている楽曲の再生位置に応じたタイミングでスピーチのみを再生すればよい。
[3] Assist Vocal that Reproduces Only Speech In the above-described assist vocal process, the music that is being reproduced by the terminal device 10 is reproduced with speech added. However, it is convenient if speech can be added to music reproduced from a source other than the terminal device 10, such as a radio in a car, a CD (hereinafter referred to as “external source”). In this case, the terminal device 10 basically generates the speech information by the above-described method, and only needs to reproduce the speech at a timing corresponding to the reproduction position of the music reproduced from the external source.
 この場合のアシストボーカル処理のフローチャートを図14に示す。まず、端末装置10は、外部ソースから再生されている楽曲をマイク14により集音して再生楽曲データを取得し(ステップS151)、これをゲートサーバ3へ送信する(ステップS152)。 FIG. 14 shows a flowchart of assist vocal processing in this case. First, the terminal device 10 collects music reproduced from an external source by the microphone 14 to acquire reproduced music data (step S151), and transmits this to the gate server 3 (step S152).
 ゲートサーバ153は、端末装置10から再生楽曲データを受信し、対応する楽曲及びその再生位置を特定する(ステップS153)。具体的には、ゲートサーバ3は、前述の音楽検索サーバの機能を有する音楽検索部を備え、再生楽曲データに基づいて、その楽曲を特定するとともに、その再生楽曲データの部分に対応する再生位置を特定する。そして、ゲートサーバ3は、特定した楽曲の楽曲名やアーティスト名とともに、歌詞データと、再生位置情報とを端末装置10へ送信する(ステップS154)。 The gate server 153 receives the reproduction music data from the terminal device 10, and specifies the corresponding music and its reproduction position (step S153). Specifically, the gate server 3 includes a music search unit having the function of the music search server described above, specifies the music based on the reproduced music data, and reproduces the reproduction position corresponding to the reproduced music data portion. Is identified. Then, the gate server 3 transmits the lyrics data and the reproduction position information to the terminal device 10 together with the music name and artist name of the specified music (Step S154).
 端末装置10は、受信した歌詞データを利用して、スピーチ情報を生成する(ステップS155)。なお、スピーチ情報の生成は、図3を参照して説明したのと同様の方法で行われる。なお、端末装置10は、マイク14で取得した再生楽曲データを解析することにより、楽曲解析データを取得することができる(図3のステップS32の処理)。 The terminal device 10 generates speech information using the received lyric data (step S155). Note that the speech information is generated by the same method as described with reference to FIG. In addition, the terminal device 10 can acquire music analysis data by analyzing the reproduction music data acquired with the microphone 14 (process of step S32 of FIG. 3).
 次に、端末装置10は、ゲートサーバ3から取得した再生位置情報に基づいて、その楽曲における現在の再生位置を算出する(ステップS156)。この方法については後述する。次に、端末装置10は、スピーチ強調処理を行い(ステップS157)、外部ソースにより再生されている楽曲に合わせて適切なタイミングでスピーチを再生する(ステップS158)。これにより、外部ソースから再生されている楽曲に合わせて、スピーチが再生される。 Next, the terminal device 10 calculates the current playback position of the music based on the playback position information acquired from the gate server 3 (step S156). This method will be described later. Next, the terminal device 10 performs speech enhancement processing (step S157), and reproduces speech at an appropriate timing according to the music being reproduced by the external source (step S158). As a result, the speech is reproduced in accordance with the music being reproduced from the external source.
 そして、端末装置10は、スピーチの再生を終了すべきか否かを判定し(ステップS159)、終了させるべきでない場合には、ステップS156へ戻って処理を継続する。一方、外部ソースからの楽曲の再生が終了した場合、再生されている楽曲が別の楽曲に変わった場合、再生すべきスピーチが無くなった場合など、スピーチの再生を終了すべき場合には(ステップS159:Yes)、処理を終了する。 Then, the terminal device 10 determines whether or not to end the speech reproduction (step S159), and if not to end, returns to step S156 and continues the process. On the other hand, when the playback of a song from an external source is finished, when the song being played is changed to another song, when there is no more speech to be played, etc. S159: Yes), the process ends.
 次に、図15を参照して、ステップS156において楽曲の現在の再生位置を特定する方法を説明する。端末装置10からゲートサーバ3へ送信される再生楽曲データは、実際には複数のオーディオフレームのデータとなる。即ち、端末装置10は、外部ソースにより再生されている楽曲をマイク14で集音し、複数のオーディオフレームとして順次ゲートサーバ3へ送信する。 Next, with reference to FIG. 15, a method for specifying the current reproduction position of the music in step S156 will be described. The reproduced music data transmitted from the terminal device 10 to the gate server 3 is actually data of a plurality of audio frames. That is, the terminal device 10 collects the music reproduced by the external source with the microphone 14 and sequentially transmits it to the gate server 3 as a plurality of audio frames.
 図15の例では、端末装置10は、外部ソースにより再生されている楽曲のオーディオフレームn、(n+1)、(n+2)、...を、再生楽曲データとして順次ゲートサーバ3へ送信する。この際、端末装置10は、最初に再生楽曲データを送信した時刻、図15の例ではオーディオフレームnを送信した時刻(以下、「基準時刻t0」と呼ぶ。)を記憶しておく。 In the example of FIG. 15, the terminal device 10 has audio frames n, (n + 1), (n + 2),. . . Are sequentially transmitted to the gate server 3 as reproduced music data. At this time, the terminal device 10 stores the time when the reproduced music data is first transmitted, and the time when the audio frame n is transmitted in the example of FIG. 15 (hereinafter referred to as “reference time t0”).
 ゲートサーバ3の音楽検索部は、データベースに記憶された多数の楽曲の情報を参照し、受信した複数のオーディオフレームに基づいて楽曲を特定する。図15の例では、ゲートサーバ3の音楽検索部は、オーディオフレームn~(n+4)に基づいて楽曲を特定できたものとする。この場合、ゲートサーバ3は、楽曲判定結果として、楽曲名、アーティスト名などに加えて、端末装置10から最初に受信したオーディオフレームnの曲先頭からの再生時間(tn)を再生位置情報として端末装置10へ送信する。即ち、図14のステップS154でゲートサーバ3から端末装置10へ送信される再生位置情報は、端末装置10がゲートサーバ3へ最初に送信したオーディオフレームnの、その楽曲の先頭からの経過時間となっている。そこで、ステップS156において、端末装置10は、予め記憶していた基準時刻t0から現在までの経過時間の経過時間Δtを算出し、これを再生時間tnに加算する。即ち、ゲートサーバ3から送信される再生時間tnは、その楽曲の先頭からオーディオフレームnまでの時間であり、経過時間Δtはオーディオフレームnから現在までの時間である。よって、現在の再生位置(再生時間)Tcは、以下の式で算出される。 The music search unit of the gate server 3 refers to information on a large number of music pieces stored in the database, and specifies music pieces based on the received plurality of audio frames. In the example of FIG. 15, it is assumed that the music search unit of the gate server 3 can identify the music based on the audio frames n to (n + 4). In this case, the gate server 3 uses the playback time information (tn) from the beginning of the audio frame n received from the terminal device 10 as the playback position information as the playback position information in addition to the music title, artist name, etc. Transmit to device 10. That is, the reproduction position information transmitted from the gate server 3 to the terminal device 10 in step S154 of FIG. 14 is the elapsed time from the beginning of the music of the audio frame n first transmitted to the gate server 3 by the terminal device 10. It has become. Therefore, in step S156, the terminal apparatus 10 calculates the elapsed time Δt of the elapsed time from the reference time t0 stored in advance to the present, and adds this to the reproduction time tn. That is, the reproduction time tn transmitted from the gate server 3 is the time from the beginning of the music to the audio frame n, and the elapsed time Δt is the time from the audio frame n to the present. Therefore, the current playback position (playback time) Tc is calculated by the following equation.
    Tc=tn+Δt                (2)
 以上のように、ゲートサーバ3に音楽検索機能を設け、再生楽曲データに基づいて楽曲及びその再生位置を特定することにより、外部ソースから再生されている楽曲に合わせてスピーチを再生することができる。また、ゲートサーバ3に音楽検索機能を設ける代わりに、外部の音楽検索サーバを利用しても良い。
Tc = tn + Δt (2)
As described above, by providing a music search function in the gate server 3 and specifying the music and its reproduction position based on the reproduction music data, it is possible to reproduce the speech according to the music being reproduced from the external source. . Further, an external music search server may be used instead of providing the gate server 3 with a music search function.
 なお、ステップS159では、1つの楽曲が終了したときに再生を終了してもよいが、1つの楽曲が終了した後で別の楽曲が再生されているような場合には、処理を継続してもよい。即ち、端末装置10からゲートサーバ3への楽曲再生データの送信を継続している間は、スピーチの再生を継続することとしてもよい。これにより、外部ソースから再生される曲が変わっても、それに追従してスピーチの再生を継続することが可能となる。
[4]利用制限
 次に、上述のアシストボーカルの利用制限について説明する。アシストボーカルを行う際、端末装置10は車両1に搭載されたカーオーディオなどの音楽再生装置20とともに動作する。ここで、音楽再生装置20としては種々の製品が存在するが、アシストボーカルを行う際に使用する音楽再生装置20を無制限とすると、再生される音楽の音質や著作権の管理などにおいて問題が生じる可能性がある。そこで、アシストボーカルを行う際に利用可能な音楽再生装置20に制限を設けることにより上記の問題を解決する。具体的には、特定の生産者により生産された製品を音楽再生装置20として使用している場合に限りアシストボーカルを実行可能とする。
In step S159, the reproduction may be ended when one music is finished. However, when another music is reproduced after one music is finished, the process is continued. Also good. That is, the speech reproduction may be continued while the transmission of the music reproduction data from the terminal device 10 to the gate server 3 is continued. Thereby, even if the music reproduced from the external source changes, it becomes possible to continue the speech reproduction following the song.
[4] Usage Restriction Next, the usage restriction of the assist vocal will be described. When performing assist vocal, the terminal device 10 operates together with a music playback device 20 such as a car audio mounted on the vehicle 1. Here, there are various products as the music playback device 20. However, if the music playback device 20 used for assist vocals is unlimited, there will be a problem in the sound quality and copyright management of the played music. there is a possibility. Therefore, the above-mentioned problem is solved by providing a restriction on the music playback device 20 that can be used when performing assist vocals. Specifically, the assist vocal can be executed only when a product produced by a specific producer is used as the music playback device 20.
 図16(A)は、未販売製品、即ち市場において新規に販売を開始する製品について利用制限を行う方法を模式的に示す。アシストボーカルを実行可能な製品(以下、「利用許可製品」とも呼ぶ。)を生産する生産者は、生産工場において製品を生産する際に、個々の製品にデバイスIDを付与する。このデバイスIDは、例えば製品のシリアル番号などとすることができ、工場出荷前に音楽再生装置20の内部メモリ20xなどに記憶される。また、生産工場は、出荷した各製品に付与したデバイスIDをゲートサーバ3へ通知し、ゲートサーバ3はデバイスIDを内部の記憶部3xに記憶する。これにより、ゲートサーバ3の記憶部3xには、利用許可情報として、利用許可製品のデバイスIDが記憶される。 FIG. 16 (A) schematically shows a method of restricting the use of unsold products, that is, products that are newly sold in the market. A producer who produces a product that can execute assist vocals (hereinafter also referred to as “usable product”) assigns a device ID to each product when the product is produced in a production factory. This device ID can be a serial number of a product, for example, and is stored in the internal memory 20x of the music playback device 20 before shipment from the factory. The production factory notifies the gate server 3 of the device ID assigned to each shipped product, and the gate server 3 stores the device ID in the internal storage unit 3x. Thereby, the device ID of the use permitted product is stored in the storage unit 3x of the gate server 3 as the use permission information.
 音楽再生装置20の製品を購入したユーザは、それを車両に取り付ける。これにより、図16(A)に示すように、音楽再生装置20は端末装置10と通信可能になる。また、音楽再生装置20のメモリ20xには、前述のようにその製品のデバイスIDが記憶されている。 The user who purchased the product of the music playback device 20 attaches it to the vehicle. Thereby, as shown in FIG. 16A, the music playback device 20 can communicate with the terminal device 10. Further, the device ID of the product is stored in the memory 20x of the music playback device 20 as described above.
 さて、アシストボーカルを実行したい場合、ユーザはデバイスIDを用いた利用可否チェック処理を実行する。図17は利用可否チェック処理のフローチャートを示す。この利用可否チェック処理は、図16(A)に示す環境において、ゲートサーバ3と端末装置10との間で実行される。 Now, when executing the assist vocal, the user executes the availability check process using the device ID. FIG. 17 shows a flowchart of the availability check process. This availability check process is executed between the gate server 3 and the terminal device 10 in the environment shown in FIG.
 まず、端末装置10は、音楽再生装置20と通信してデバイスIDを取得し(ステップS301)、ゲートサーバ3へ送信する(ステップS302)。ゲートサーバ3は、受信したデバイスIDに基づいて、その音楽再生装置20が利用許可製品であるか否かを判定する(ステップS303)。前述のように、ゲートサーバ3の記憶部3xには利用許可製品のデバイスIDが記憶されている。よって、ゲートサーバ3は、受信したデバイスIDが記憶部3xに記憶されているか否かを判定する。そして、ゲートサーバ3は、受信したデバイスIDが記憶部3xに記憶されている場合にはその音楽再生装置20を利用許可製品と判定し、記憶されていない場合にはその音楽再生装置20を利用許可製品ではないと判定する。そして、ゲートサーバ3は、判定結果を端末装置10へ送信する(ステップS304)。 First, the terminal device 10 communicates with the music playback device 20 to obtain a device ID (step S301) and transmits it to the gate server 3 (step S302). Based on the received device ID, the gate server 3 determines whether the music playback device 20 is a use-permitted product (step S303). As described above, the device ID of the permitted product is stored in the storage unit 3x of the gate server 3. Therefore, the gate server 3 determines whether or not the received device ID is stored in the storage unit 3x. When the received device ID is stored in the storage unit 3x, the gate server 3 determines that the music playback device 20 is a use-permitted product, and uses the music playback device 20 when it is not stored. It is determined that the product is not a permitted product. Then, the gate server 3 transmits the determination result to the terminal device 10 (step S304).
 端末装置10は、判定結果を受け取り、表示部に表示するなどしてユーザに通知する(ステップS305)。こうして、利用可否チェック処理は終了する。 The terminal device 10 receives the determination result and notifies the user by displaying it on the display unit (step S305). Thus, the availability check process ends.
 次に、既に販売済みの製品について利用制限を行う方法を説明する。図16(B)は、販売済みの音楽再生装置20について利用制限を行う方法を模式的に示す。この場合、生産工場などからゲートサーバ3へデバイスIDの通知は行われていないので、利用許可製品のデバイスIDはゲートサーバ3には記憶されていない。しかし、通常は販売済みの製品にもシリアル番号などのデバイスIDは付与されており、そのデバイスIDには生産者などに固有のコードが含まれていることが多い。よって、この固有のコードを利用許可情報として利用して認証を行う。 Next, a method for restricting the use of products already sold will be described. FIG. 16B schematically shows a method of restricting the use of a music playback device 20 that has already been sold. In this case, since the device ID is not notified to the gate server 3 from the production factory or the like, the device ID of the permitted product is not stored in the gate server 3. However, a device ID such as a serial number is usually given to a sold product, and the device ID often includes a code unique to the producer or the like. Therefore, authentication is performed using this unique code as use permission information.
 例えば、利用許可製品の生産者「P社」の固有コードが「PEC」であるとすると、P社の生産した音楽再生装置20のメモリ20xには、「PEC」を含むデバイスIDが記憶されている。よって、ゲートサーバ3の記憶部3xには、その固有コード「PEC」を利用許可コードとして記憶しておく。ゲートサーバ3は、端末装置10からデバイスIDが送信されると、受信したデバイスIDに利用許可コード「PEC」が含まれているか否かを判定する。そして、利用許可コード「PEC」が含まれている場合にはその音楽再生装置20を利用許可製品であると判定し、チェックコード「PEC」が含まれていない場合にはその音楽再生装置20を利用許可製品ではないと判定する。こうして、既に販売済みの音楽再生装置20についても利用制限を行うことができる。 For example, if the unique code of the producer “P company” of the licensed product is “PEC”, a device ID including “PEC” is stored in the memory 20x of the music playback device 20 produced by P company. Yes. Therefore, the storage unit 3x of the gate server 3 stores the unique code “PEC” as a use permission code. When the device ID is transmitted from the terminal device 10, the gate server 3 determines whether or not the usage permission code “PEC” is included in the received device ID. If the usage permission code “PEC” is included, the music playback device 20 is determined to be a usage-permitted product. If the check code “PEC” is not included, the music playback device 20 is determined. It is determined that the product is not a licensed product. In this way, it is possible to restrict the use of the music playback apparatus 20 that has already been sold.
 販売済みの製品についての利用可否チェック処理は、未販売製品の場合と同様に、図17に示すフローチャートにより行われる。但し、販売済み製品の場合、ステップS303において、ゲートサーバ3は、音楽再生装置20から受信したデバイスIDに利用許可コード「PEC」が含まれているか否かにより、利用許可製品であるか否かを判定する。 The availability check process for a sold product is performed according to the flowchart shown in FIG. 17 as in the case of an unsold product. However, in the case of a sold product, in step S303, the gate server 3 determines whether or not the product is a use permitted product depending on whether or not the use permission code “PEC” is included in the device ID received from the music playback device 20. Determine.
 次に、利用可否チェック処理を実行するタイミングについて説明する。利用可否チェック処理は、まず、音楽再生装置20を車両1に搭載した後、端末装置10とゲートサーバ3との最初の通信時に行うことができる。また、音楽再生装置20を利用したアシストボーカルの最初の実行時に行うこともできる。即ち、端末装置10から最初にアシストボーカルの要求がなされたときに、ゲートサーバ3が端末装置10へデバイスIDの送信を要求し、利用可否チェック処理を行う。 Next, the timing for executing the availability check process will be described. The availability check process can be performed at the first communication between the terminal device 10 and the gate server 3 after the music playback device 20 is first mounted on the vehicle 1. It can also be performed at the first execution of the assist vocal using the music playback device 20. That is, when an assist vocal is first requested from the terminal device 10, the gate server 3 requests the terminal device 10 to transmit a device ID and performs a usability check process.
 その代わりに、ユーザがアシストボーカルを実行するたびに利用可否チェック処理を行うようにしてもよい。この場合には、端末装置10からアシストボーカルの要求がなされるたびに、ゲートサーバ3は端末装置10に音楽再生装置20のデバイスIDを要求して利用可否を判定する。ゲートサーバ3は、その音楽再生装置20を利用許可製品と判定した場合に限ってその後のアシストボーカル処理を続ける。端末装置10は、図13又は図14に示す方法により、アシストボーカルを実行する。具体的には、端末装置10は、ゲートサーバ3もしくは端末装置10で生成された歌詞音声付楽曲データを音楽再生装置20に送信し、音楽再生装置20で受信した歌詞音声付楽曲データを再生する。ここで、端末装置10は歌詞音声付楽曲データを音楽再生装置20に送信する前に、再度音楽再生装置20の識別情報を受信し、歌詞音声付楽曲データを送信しようとしている音楽再生装置20がゲートサーバ3で利用許可製品と判定された製品かどうかを再判定してもよい。再判定の結果、歌詞音声付楽曲データを送信しようとしている音楽再生装置20が利用許可製品であると判定された場合には、端末装置10は歌詞音声付楽曲データをその音楽再生装置20に送信する。また、再判定の結果、歌詞音声付楽曲データを送信しようとしている音楽再生装置20が利用許可製品ではないと判定された場合には、端末装置10は歌詞音声付楽曲データをその音楽再生装置20に送信しない。一方、ゲートサーバ3は、その音楽再生装置20を利用許可製品でないと判定した場合には、その旨を端末装置10へ通知する。 Alternatively, the availability check process may be performed every time the user executes the assist vocal. In this case, every time an assist vocal request is made from the terminal device 10, the gate server 3 requests the device ID of the music playback device 20 from the terminal device 10 and determines whether or not it can be used. The gate server 3 continues the assist vocal process thereafter only when it is determined that the music playback device 20 is a use permitted product. The terminal device 10 executes assist vocal by the method shown in FIG. 13 or FIG. Specifically, the terminal device 10 transmits the song data with lyrics voice generated by the gate server 3 or the terminal device 10 to the music playback device 20, and plays back the song data with lyrics voice received by the music playback device 20. . Here, the terminal device 10 receives the identification information of the music playback device 20 again before transmitting the music data with lyrics audio to the music playback device 20, and the music playback device 20 that is about to transmit the music data with lyrics audio transmits the music data. It may be determined again whether or not the product is determined to be a use-permitted product by the gate server 3. As a result of the re-determination, when it is determined that the music playback device 20 that is going to transmit the song data with lyrics voice is a use permitted product, the terminal device 10 sends the song data with lyrics voice to the music playback device 20. To do. As a result of the re-determination, when it is determined that the music playback device 20 that is going to transmit the song data with lyrics voice is not a use-permitted product, the terminal device 10 sends the song data with lyrics voice to the music playback device 20. Do not send to. On the other hand, when the gate server 3 determines that the music playback device 20 is not a use-permitted product, the gate server 3 notifies the terminal device 10 to that effect.
 [5]合唱機能
 次に、アシストボーカルを利用した合唱機能について説明する。アシストボーカルによりユーザは運転中に曲を歌唱することができるが、運転手一人だけの場合には歌唱する楽しさが不足する傾向がある。そこで、ゲートサーバ3に複数のユーザの歌声データを収集、記憶しておき、あるユーザがアシストボーカルを行う際に、ゲートサーバ3から他のユーザの歌声データを同時にダウンロードして車両で再生する。これにより、ユーザ(運転手)が1人の場合であっても疑似的な合唱を実現することができる。
[5] Choral function Next, the choral function using assist vocals will be described. The assist vocal allows the user to sing a song while driving, but the joy of singing tends to be insufficient when there is only one driver. Therefore, the singing voice data of a plurality of users is collected and stored in the gate server 3, and when a certain user performs assist vocals, the singing voice data of other users are simultaneously downloaded from the gate server 3 and reproduced on the vehicle. Thereby, even if the number of users (drivers) is one, pseudo chorus can be realized.
 [5.1]歌声データの生成
 合唱機能を実現するためには、複数のユーザが歌声データを生成し、ゲートサーバ3にアップロードしておく必要がある。以下、歌声データの生成方法について説明する。
[5.1] Generation of Singing Voice Data In order to realize the chorus function, it is necessary for a plurality of users to generate singing voice data and upload it to the gate server 3. Hereinafter, a method for generating singing voice data will be described.
 (1)第1の方法
 第1の方法では、同一の車内でユーザが音楽に合わせて歌唱したときの音から歌唱していないときの音を減算することによりユーザの歌声データを生成する。
(1) First Method In the first method, the singing voice data of the user is generated by subtracting the sound when the user is not singing from the sound when the user sings according to music in the same vehicle.
 図18(A)は、ユーザが歌唱しているときの音を録音する環境を模式的に示す。車室内において、音楽再生装置20がソース音源に基づいてスピーカ30から車室内に楽曲を再生し、ユーザUは再生された楽曲に合わせて歌唱する。そのときの音を車室内に配置されたマイクMにより集音する。マイクMが生成する録音データには、楽曲の音に加えてユーザの歌声が含まれている(以下、「歌声付録音データ」と記す。)。この録音データには、車室内の音響特性CHが含まれている。なお、マイクMとして端末装置のマイク14を使用しても良い。 FIG. 18A schematically shows an environment for recording a sound when the user is singing. In the passenger compartment, the music reproducing device 20 reproduces music from the speaker 30 into the passenger compartment based on the source sound source, and the user U sings along with the reproduced music. The sound at that time is collected by the microphone M arranged in the vehicle interior. The recording data generated by the microphone M includes the singing voice of the user in addition to the sound of the music (hereinafter referred to as “recording data with singing voice”). This recorded data includes the acoustic characteristics CH in the passenger compartment. Note that the microphone 14 of the terminal device may be used as the microphone M.
 図18(B)は、ユーザが歌唱していないときの音を録音する環境を模式的に示す。車室内において、音楽再生装置20がソース音源に基づいてスピーカ30から車室内に楽曲を再生し、再生された音を車室内に配置されたマイクMにより集音する。マイクMが生成する録音データには、楽曲の音が含まれているが、ユーザの歌声は含まれていない(以下、「歌声なし録音データ」と記す。)。なお、この録音データにも、車室内の音響特性CHが含まれている。 FIG. 18 (B) schematically shows an environment for recording a sound when the user is not singing. In the passenger compartment, the music reproducing device 20 reproduces music from the speaker 30 into the passenger compartment based on the source sound source, and the reproduced sound is collected by a microphone M arranged in the passenger compartment. The recording data generated by the microphone M includes the sound of the music, but does not include the user's singing voice (hereinafter referred to as “recording data without a singing voice”). This recorded data also includes the acoustic characteristic CH in the passenger compartment.
 こうして得られた録音データを利用し、端末装置10は図19(A)に示すように、歌声付録音データから歌声なし録音データを減算して歌声データを生成する。このように、ユーザが歌唱したときの録音データと歌唱しなかった時の録音データとの差分をユーザの歌声データとして生成することができる。 Using the recording data thus obtained, the terminal device 10 generates singing voice data by subtracting the recording data without singing voice from the recording data with singing voice as shown in FIG. Thus, the difference of the recording data when a user sings and the recording data when not singing can be produced | generated as a user's singing voice data.
 (2)第2の方法
 第2の方法では、第1の方法と同様に、図18(A)のようにして歌声付録音データを生成する。一方、歌声なし録音データの録音は行わず、代わりに、ソース音源と車室内の音響特性から歌声なしデータを生成する。車室内の音響特性とは、具体的には車室内で予め測定されたインパルス応答を指す。
(2) Second Method In the second method, similarly to the first method, recording data with a singing voice is generated as shown in FIG. On the other hand, the recording data without singing voice is not recorded, but instead, the data without singing voice is generated from the sound characteristics of the source sound source and the passenger compartment. The acoustic characteristic in the vehicle interior specifically refers to an impulse response measured in advance in the vehicle interior.
 図18(B)から理解されるように、歌声なし録音データは、ソース音源を再生して得た楽曲を車室内の音響特性の下で録音したものであるので、ソース音源に車室内の音響特性を畳み込むことにより、歌声なし録音データと等価な歌声なしデータを生成することができる。そして、図19(B)に示すように、こうして生成した歌声なしデータを歌声付録音データから減算することにより、ユーザの歌声データを生成することができる。 As can be understood from FIG. 18B, since the recording data without singing voice is obtained by recording the music obtained by reproducing the source sound source under the acoustic characteristics in the vehicle interior, the sound in the vehicle interior is included in the source sound source. By convolving the characteristics, it is possible to generate singing-free data equivalent to recording data without singing. Then, as shown in FIG. 19B, the singing voice data of the user can be generated by subtracting the singing voiceless data thus generated from the recording data with singing voice.
 (3)第3の方法
 第3の方法は、基本的に第2の方法と同様に、ソース音源に車室内の音響特性を畳み込むことにより歌声なしデータを生成し、これを歌声付録音データから減算して歌声データを生成する。但し、第2の方法は車室内の音響特性が変化しないことを前提にしているが、実際には車室内の音響特性は時間、状況に応じて変化する。そこで、第3の方法では、車室内の音響特性の変化を適応信号処理により補正する。
(3) The third method The third method is basically the same as the second method, by generating the singing-free data by convolving the acoustic characteristics of the vehicle interior with the source sound source, and using this from the recorded data with singing voice Subtract and generate singing voice data. However, although the second method is based on the premise that the acoustic characteristics in the passenger compartment do not change, the acoustic characteristics in the passenger compartment actually change according to time and circumstances. Therefore, in the third method, the change in the acoustic characteristics in the passenger compartment is corrected by adaptive signal processing.
 図19(C)は、第3の方法により歌声データを生成する構成のブロック図である。歌声付録音データはフィルタ61により補正されて加算器62へ入力される。また、ソース音源に車室内の音響特性を畳み込んで生成された歌声なしデータが加算器62へ入力される。加算器62は、フィルタ処理後の歌声付録音データから歌声なしデータを減算し、歌声データとして出力する。 FIG. 19C is a block diagram of a configuration for generating singing voice data by the third method. The recorded data with singing voice is corrected by the filter 61 and input to the adder 62. In addition, data without singing voice generated by convolving the acoustic characteristics of the vehicle interior with the source sound source is input to the adder 62. The adder 62 subtracts the data without singing voice from the filtered recording data with the singing voice and outputs it as singing voice data.
 また、歌声データは適応信号処理部63にも供給される。適応信号処理部63は、歌声データに含まれる誤差、即ち、車室内の音響特性の変化による変動分を除去するように、フィルタ61に設定する特性(係数W)を演算し、フィルタ61に供給する。例えば、適応信号処理部63は、歌声が含まれていない期間や歌声が含まれていない周波数成分における歌声データがゼロとなるようにフィルタ61の係数Wを演算する。こうして、車室内の音響特性の変動分はフィルタ61によりキャンセルされる。 The singing voice data is also supplied to the adaptive signal processing unit 63. The adaptive signal processing unit 63 calculates the characteristic (coefficient W) to be set in the filter 61 so as to remove the error included in the singing voice data, that is, the variation due to the change in the acoustic characteristic in the vehicle interior, and supplies the calculated characteristic To do. For example, the adaptive signal processing unit 63 calculates the coefficient W of the filter 61 so that the singing voice data in the period in which the singing voice is not included or the frequency component in which the singing voice is not included becomes zero. Thus, the change in the acoustic characteristics in the passenger compartment is canceled by the filter 61.
 [5.2]歌声データ生成処理
 次に、歌声データの生成処理について説明する。歌声データはゲートサーバ3側で生成する場合と、端末装置10側で生成する場合とがある。なお、以下の説明では、説明の便宜上、上記の第1の方法により歌声データを生成するものとする。
[5.2] Singing Voice Data Generation Processing Next, singing voice data generation processing will be described. The singing voice data may be generated on the gate server 3 side or generated on the terminal device 10 side. In the following description, for the convenience of explanation, it is assumed that singing voice data is generated by the first method.
 図20は、ゲートサーバ3が歌声データを生成する場合のフローチャートである。まず、端末装置10は、図18(A)に示すように歌声付録音データを生成し(ステップS401)、次に、図18(B)に示すように歌声なし録音データを生成する(ステップS402)。そして、端末装置10は、歌声付録音データと歌声なし録音データとをゲートサーバ3へ送信する(ステップS403)。この際、端末装置10は、それらの録音データに対応する楽曲コードなどの楽曲情報を付加して送信する。 FIG. 20 is a flowchart when the gate server 3 generates singing voice data. First, the terminal device 10 generates recording data with a singing voice as shown in FIG. 18A (step S401), and then generates recording data without a singing voice as shown in FIG. 18B (step S402). ). And the terminal device 10 transmits the recording data with singing voice and the recording data without singing voice to the gate server 3 (step S403). At this time, the terminal device 10 adds music information such as a music code corresponding to the recorded data and transmits the data.
 ゲートサーバ3は、歌声付録音データと歌声なし録音データを受信し、図19(A)に示す演算により歌声データを生成し、楽曲情報に基づいて楽曲に対応付けて内部のデータベースに保存する(ステップS404)。こうして、ゲートサーバ3には、楽曲毎に複数のユーザの歌声データが保存される。 The gate server 3 receives the recording data with singing voice and the recording data without singing voice, generates singing voice data by the calculation shown in FIG. 19A, and stores it in the internal database in association with the music based on the music information ( Step S404). Thus, the singing voice data of a plurality of users is stored in the gate server 3 for each music piece.
 図21は、端末装置10が歌声データを生成する場合のフローチャートである。まず、端末装置10は、図18(A)に示すように歌声付録音データを生成し(ステップS411)、次に、図18(B)に示すように歌声なし録音データを生成する(ステップS412)。そして、端末装置10は、図19(A)に示す演算により、歌声付録音データと歌声なし録音データから歌声データを生成し(ステップS413)、ゲートサーバ3へ送信する(ステップS414)。この際、端末装置10は、その歌声データに対応する楽曲コードなどの楽曲情報を付加して送信する。 FIG. 21 is a flowchart when the terminal device 10 generates singing voice data. First, the terminal device 10 generates recording data with a singing voice as shown in FIG. 18A (step S411), and then generates recording data without a singing voice as shown in FIG. 18B (step S412). ). And the terminal device 10 produces | generates singing voice data from the recording data with a singing voice and recording data without a singing voice by the operation | movement shown to FIG. 19 (A) (step S413), and transmits to the gate server 3 (step S414). At this time, the terminal device 10 adds music information such as a music code corresponding to the singing voice data and transmits it.
 ゲートサーバ3は、歌声データと楽曲情報を受信し、楽曲情報に基づいて、楽曲に対応付けて歌声データを内部のデータベースに保存する(ステップS415)。こうして、ゲートサーバ3には楽曲毎に複数のユーザの歌声データが保存される。 The gate server 3 receives the singing voice data and the music information, and stores the singing voice data in the internal database in association with the music based on the music information (step S415). Thus, the singing voice data of a plurality of users is stored in the gate server 3 for each music piece.
 [5.3]合唱処理
 次に、歌声データを利用した合唱処理について説明する。
[5.3] Choral Process Next, a choral process using singing voice data will be described.
 (1)端末装置によりスピーチ情報生成処理が行われる場合
 図22は、端末装置10側によりスピーチ情報生成処理が行われる場合の、合唱処理のフローチャートである。この例では、合唱処理のために必要なデータを主として端末装置10が生成する。
(1) When the speech information generation process is performed by the terminal device FIG. 22 is a flowchart of the choral process when the speech information generation process is performed by the terminal device 10 side. In this example, the terminal device 10 mainly generates data necessary for the choral process.
 まず、ゲートサーバ3は、ネットワーク4を介してコンテンツプロバイダ2に接続し、複数の楽曲について、楽曲データ及び歌詞データを取得し、内部のデータベースに保存しておく(ステップS501)。 First, the gate server 3 is connected to the content provider 2 via the network 4, acquires music data and lyrics data for a plurality of music, and stores them in an internal database (step S 501).
 端末装置10は、ユーザによる操作部15の操作により、再生すべき楽曲の指定を受け取り(ステップS502)、さらに合唱機能を利用する旨の指定を受け取る(ステップS503)。次に、端末装置10は、その楽曲を指定する楽曲指定情報(合唱機能の指定を含む)をゲートサーバ3へ送信する(ステップS504)。ゲートサーバ3は、受け取った楽曲指定情報に対応する楽曲の楽曲データ、歌詞データ及び歌声データをデータベースから取得し、端末装置10へ送信する(ステップS505)。 The terminal device 10 receives the designation of the music to be played by the operation of the operation unit 15 by the user (step S502), and further receives the designation to use the choral function (step S503). Next, the terminal device 10 transmits music designation information (including designation of the choral function) for designating the music to the gate server 3 (step S504). The gate server 3 acquires the song data, lyrics data, and singing voice data of the song corresponding to the received song designation information from the database, and transmits it to the terminal device 10 (step S505).
 次に、端末装置10は、受信した楽曲データ及び歌詞データを利用して、スピーチ情報を生成する(ステップS506)。そして、端末装置10は、楽曲とスピーチ音声を再生するとともに、歌声データに基づいて歌声を再生する(ステップS507)。 Next, the terminal device 10 generates speech information by using the received music data and lyrics data (step S506). And the terminal device 10 reproduces | regenerates a song and speech voice, and also reproduces a singing voice based on singing voice data (step S507).
 次に、端末装置10は、その楽曲の再生が終了したか否かを判定する(ステップS508)。楽曲の再生が終了していない場合、処理はステップS507へ戻り、楽曲の再生が終了した場合には処理は終了する。 Next, the terminal device 10 determines whether or not the reproduction of the music has ended (step S508). If the reproduction of the music has not ended, the process returns to step S507, and if the reproduction of the music has ended, the process ends.
 なお、上記の例では、ステップS501でゲートサーバ3はコンテンツプロバイダから楽曲データを取得しているが、楽曲データが端末装置10に保存されている場合には、ゲートサーバ3は端末装置10から楽曲データを取得してもよい。また、ゲートサーバ3内のデータベースに楽曲データが保存されている場合には、そこから楽曲データを取得してもよい。 In the above example, the gate server 3 acquires the music data from the content provider in step S501. However, when the music data is stored in the terminal device 10, the gate server 3 receives the music data from the terminal device 10. Data may be acquired. Further, when music data is stored in the database in the gate server 3, the music data may be acquired therefrom.
 (2)ゲートサーバによりスピーチ情報生成処理が行われる場合
 図23は、ゲートサーバ3側によりスピーチ情報生成処理が行われる場合の、合唱処理のフローチャートである。この例では、合唱処理のために必要なデータを主としてゲートサーバ3が生成する。
(2) When the speech information generation process is performed by the gate server FIG. 23 is a flowchart of the choral process when the speech information generation process is performed by the gate server 3 side. In this example, the gate server 3 mainly generates data necessary for the choral process.
 まず、ゲートサーバ3は、ネットワーク4を介してコンテンツプロバイダ2に接続し、複数の楽曲について、楽曲データ及び歌詞データを取得し、内部のデータベースに保存する(ステップS511)。そして、ゲートサーバ3は、各楽曲について、取得した楽曲データと歌詞データとに基づいてスピーチ情報を生成する(ステップS512)。 First, the gate server 3 is connected to the content provider 2 via the network 4, acquires music data and lyrics data for a plurality of music, and stores them in an internal database (step S511). And the gate server 3 produces | generates speech information about each music based on the acquired music data and lyrics data (step S512).
 スピーチ情報を生成すると、ゲートサーバ3は、楽曲データにスピーチを付加してスピーチ付楽曲データを生成する(ステップS513)。具体的に、ゲートサーバ3は、生成したスピーチ情報に基づいて、各スピーチに対応するスピーチ信号を、適切なタイミングで楽曲データに合成し、スピーチ付楽曲データを生成してデータベースに記憶する。 When the speech information is generated, the gate server 3 adds the speech to the music data and generates music data with speech (step S513). Specifically, based on the generated speech information, the gate server 3 synthesizes a speech signal corresponding to each speech with music data at an appropriate timing, generates music data with speech, and stores it in the database.
 端末装置10は、ユーザによる操作部15の操作により、再生すべき楽曲の指定を受け取り(ステップS514)、さらに合唱機能を利用する旨の指定を受け取る(ステップS515)。次に、端末装置10は、その楽曲を指定する楽曲指定情報(合唱機能の指定を含む)をゲートサーバ3へ送信する(ステップS516)。 The terminal device 10 receives the designation of the music to be played by the operation of the operation unit 15 by the user (step S514), and further receives the designation to use the choral function (step S515). Next, the terminal device 10 transmits music designation information (including designation of the choral function) for designating the music to the gate server 3 (step S516).
 ゲートサーバ3は、受け取った楽曲指定情報に対応する楽曲のスピーチ付楽曲データと歌声データとをデータベースから読み出し、それらを合成して歌声・スピーチ付楽曲データを生成し(ステップS517)、端末装置10へ送信する(ステップS518)。 The gate server 3 reads the music data with speech and singing voice data of the music corresponding to the received music designation information from the database, synthesizes them, and generates music data with singing voice and speech (step S517). (Step S518).
 端末装置10は、受信した歌声・スピーチ付楽曲データを再生する(ステップS519)。これにより、楽曲の再生中の適切なタイミングで、スピーチが再生されるとともに、他のユーザの歌声が再生される。 The terminal device 10 reproduces the received song data with speech / speech (step S519). As a result, the speech is reproduced and the singing voice of another user is reproduced at an appropriate timing during the reproduction of the music.
 次に、端末装置10は、その楽曲の再生が終了したか否かを判定する(ステップS520)。楽曲の再生が終了していない場合、処理はステップS519へ戻り、楽曲の再生が終了した場合には処理は終了する。 Next, the terminal device 10 determines whether or not the reproduction of the music has ended (step S520). If the reproduction of the music has not ended, the process returns to step S519, and if the reproduction of the music has ended, the process ends.
 なお、上記の例では、ステップS511でゲートサーバ3はコンテンツプロバイダから楽曲データを取得しているが、楽曲データが端末装置10に保存されている場合には、ゲートサーバ3は端末装置10から楽曲データを取得してもよい。また、ゲートサーバ3内のデータベースに楽曲データが保存されている場合には、そこから楽曲データを取得してもよい。 In the above example, the gate server 3 acquires the music data from the content provider in step S511. However, if the music data is stored in the terminal device 10, the gate server 3 receives the music data from the terminal device 10. Data may be acquired. Further, when music data is stored in the database in the gate server 3, the music data may be acquired therefrom.
 [5.4]変形例
 (変形例1)
 合唱機能を実行する際に再生すべき歌声は1人の歌声に限られない。例えば、合唱機能を実行しようとするユーザが合唱の人数を併せて指定できるようにしてもよい。この場合、ゲートサーバ3は指定された人数分の歌声データを利用して合唱機能を実行すればよい。
[5.4] Modification (Modification 1)
The singing voice to be reproduced when performing the choral function is not limited to one singing voice. For example, a user who wants to execute the choral function may be allowed to specify the number of choruses together. In this case, the gate server 3 should just perform a chorus function using the singing voice data for the designated number of people.
 (変形例2)
 上記の例では、ゲートサーバ3は、楽曲毎に歌声データをデータベースに記憶しているが、これに加えて、その歌声データを生成したユーザの属性情報、例えば、性別、年齢などを対応付けて記憶しても良い。この場合、録音データ又は歌声データをゲートサーバ3にアップロードする際に、ユーザが自分の属性情報を付加して送信すればよい。なお、これらの情報は、ユーザが入力してもよいが、携帯端末10に記憶されている情報を端末装置10が自動的に読み出してゲートサーバ3へ送信するようにしても良い。
(Modification 2)
In the above example, the gate server 3 stores the singing voice data in the database for each piece of music. In addition, the gate server 3 associates the attribute information of the user who generated the singing voice data, for example, sex, age, and the like. You may remember. In this case, when recording data or singing voice data is uploaded to the gate server 3, the user may add his / her attribute information and transmit it. These pieces of information may be input by the user, but the information stored in the mobile terminal 10 may be automatically read by the terminal device 10 and transmitted to the gate server 3.
 これにより、ゲートサーバ3に記憶されている歌声データを利用して合唱機能を実行しようとするユーザは、同時に再生される歌声の性別や年齢などを指定して、合唱機能を実行することができる。 Thereby, the user who intends to perform the choral function using the singing voice data stored in the gate server 3 can execute the choral function by specifying the gender, age, etc. of the singing voice to be reproduced at the same time. .
 (変形例3)
 上記の例では、歌声データは基本的に1つの楽曲全体について生成されているが、1つの楽曲の一部分について生成してもよい。例えば、1つの楽曲の1番、2番ごとに生成してもよく、サビやコーラス部分のみについて生成しても良い。この場合、それらの歌声データは、楽曲毎に、そのような部分を示す情報(例えば、楽曲の1番)又は曲における再生時刻とともにゲートサーバ3のデータベースに記憶される。
(Modification 3)
In the above example, the singing voice data is basically generated for one entire piece of music, but may be generated for a part of one piece of music. For example, you may produce | generate for every 1st and 2nd of one music, and may produce | generate only about a chorus and a chorus part. In this case, the singing voice data is stored in the database of the gate server 3 together with information indicating such a part (for example, No. 1 of the song) or the reproduction time of the song for each song.
 そして、合唱機能を実行するユーザは、そのような部分を示す情報を指定することにより、楽曲の一部分のみの歌声データを複数利用して合唱機能を行うことができる。例えば、楽曲の1番と2番について異なるユーザの歌声データを利用することにより、異なるユーザとの合唱を楽しむことができる。 And the user who performs the choral function can perform the choral function by using a plurality of singing voice data of only a part of the music by designating information indicating such a part. For example, it is possible to enjoy chorus with different users by using singing voice data of different users for the first and second songs.
 [5.5]歌声データの生成方法の他の利用例
 上述した歌声データを生成する第2の方法を、アシストボーカルの自動オン設定におけるユーザが歌唱しているか否かの判定に利用してもよい。第2の方法では、ソース音源に車室内の音響特性を畳み込むことにより歌声なしデータを生成しており、この歌声なしデータはユーザが歌唱していない場合のデータである。よって、端末装置10は、楽曲の再生中に車室内の音をマイクで集音し、得られたデータから上記の歌声なしデータを減算する。そして、端末装置10は、減算により得られたデータにユーザの歌声の成分が含まれている場合にはユーザが歌唱していると判定し、含まれていない場合にはユーザは歌唱していないと判定すればよい。なお、ユーザの歌声の成分が含まれているか否かは、例えば、減算により得られたデータにおいて一般的な人間の声の周波数帯域の信号レベルが所定値以上であるか否かにより判定することができる。
[5.5] Another Use Example of Method for Generating Singing Voice Data Even if the second method for generating singing voice data described above is used for determining whether or not the user is singing in the assist vocal automatic on setting. Good. In the second method, singing voiceless data is generated by convolving the acoustic characteristics in the passenger compartment with the source sound source, and this singing voiceless data is data when the user is not singing. Therefore, the terminal device 10 collects the sound in the passenger compartment with the microphone during the reproduction of the music, and subtracts the singing-free data from the obtained data. And the terminal device 10 determines with the user singing, when the component of a user's singing voice is contained in the data obtained by subtraction, and when not contained, the user is not singing. Can be determined. Whether or not the component of the user's singing voice is included is determined, for example, by determining whether or not the signal level in the general human voice frequency band is equal to or higher than a predetermined value in the data obtained by subtraction. Can do.
 本発明は、音楽を再生する装置に利用することができる。 The present invention can be used for an apparatus for playing music.
 1 車両
 2 コンテンツプロバイダ
 3 ゲートサーバ
 4 ネットワーク
 10、10x 端末装置
 12 制御部
 13 記憶部
 14 マイク
 20 音楽再生装置
 30 スピーカ
DESCRIPTION OF SYMBOLS 1 Vehicle 2 Content provider 3 Gate server 4 Network 10, 10x terminal device 12 Control part 13 Memory | storage part 14 Microphone 20 Music reproducing apparatus 30 Speaker

Claims (10)

  1.  サーバと端末装置とを備える通信システムであって、
     前記サーバは、
     利用可能な再生装置を特定する利用許可情報を記憶する記憶部と、
     前記端末装置から受信した再生装置の識別情報と前記利用許可情報とに基づいて、前記再生装置が利用可能か否かを判定する判定手段と、
     前記判定手段により前記再生装置が利用可能と判定されたときに、前記端末装置から受信したコンテンツを指定する情報に対応するコンテンツデータを前記端末装置へ送信する送信手段と、を備え、
     前記端末装置は、
     当該端末装置と接続された再生装置から当該再生装置の識別情報を取得し、前記サーバへ送信する識別情報取得手段と、
     再生すべきコンテンツを指定する情報を前記サーバに送信し、前記再生装置が利用可能と判定されたときに当該コンテンツに対応するコンテンツデータを前記サーバから受信する第1の通信手段と、
     前記コンテンツデータを、利用可能と判定された前記再生装置に送信する第2の通信手段と、
     を備えることを特徴とする通信システム。
    A communication system comprising a server and a terminal device,
    The server
    A storage unit for storing use permission information for specifying an available playback device;
    Determining means for determining whether or not the playback device can be used based on the identification information of the playback device received from the terminal device and the use permission information;
    Transmission means for transmitting content data corresponding to information specifying the content received from the terminal device to the terminal device when the determining device determines that the playback device is usable;
    The terminal device
    Identification information acquisition means for acquiring identification information of the reproduction device from a reproduction device connected to the terminal device, and transmitting the identification information to the server;
    First communication means for transmitting information specifying content to be played back to the server and receiving content data corresponding to the content from the server when it is determined that the playback device is usable;
    Second communication means for transmitting the content data to the playback device determined to be usable;
    A communication system comprising:
  2.  サーバと端末装置とを備える再生システムであって、
     前記サーバは、
     利用可能な音楽再生装置を示す利用許可情報を記憶する記憶部と、
     前記端末装置から受信した音楽再生装置の識別情報と前記利用許可情報とに基づいて、前記音楽再生装置が利用可能か否かを判定する判定手段と、
     楽曲の歌詞データを取得する取得手段と、
     前記判定手段により前記音楽再生装置が利用可能と判定されたときに、前記端末装置から受信した再生楽曲を指定する情報に対応する前記歌詞データを前記端末装置へ送信する送信手段と、を備え、
     前記端末装置は、
     当該端末装置と接続された音楽再生装置から当該音楽再生装置の識別情報を取得し、前記サーバへ送信する識別情報取得手段と、
     再生すべき楽曲である再生楽曲を選択するための入力手段と、
     選択された再生楽曲の楽曲データを取得する楽曲データ取得手段と、
     選択された再生楽曲を指定する情報を前記サーバに送信し、前記音楽再生装置が利用可能と判定されたときに当該再生楽曲に対応する歌詞データを前記サーバから受信する第1の通信手段と、
     前記歌詞データに基づいて、歌詞音声データを生成する歌詞音声データ生成手段と、
     前記楽曲中の歌詞部分に先行するように、前記歌詞音声データを前記楽曲データに加算して歌詞音声付楽曲データを生成する歌詞音声付楽曲データ生成手段と、
     前記歌詞音声付楽曲データを、利用可能と判定された前記音楽再生装置に送信する第2の通信手段と、
     を備えることを特徴とする再生システム。
    A playback system comprising a server and a terminal device,
    The server
    A storage unit for storing use permission information indicating an available music playback device;
    Determination means for determining whether or not the music playback device can be used based on the identification information of the music playback device received from the terminal device and the use permission information;
    An acquisition means for acquiring lyrics data of the music;
    Transmitting means for transmitting to the terminal device the lyric data corresponding to the information specifying the reproduced music received from the terminal device when the determining device determines that the music reproducing device is usable;
    The terminal device
    Identification information acquisition means for acquiring identification information of the music playback device from the music playback device connected to the terminal device, and transmitting the identification information to the server;
    An input means for selecting a reproduction music that is a music to be reproduced;
    Music data acquisition means for acquiring music data of the selected playback music;
    First communication means for transmitting information specifying the selected reproduction music piece to the server, and receiving lyrics data corresponding to the reproduction music piece from the server when the music reproduction device is determined to be usable;
    Lyrics audio data generating means for generating lyrics audio data based on the lyrics data;
    Song data with lyric audio generating means for generating song data with lyric audio by adding the lyric audio data to the song data so as to precede the lyric part in the song;
    Second communication means for transmitting the song data with lyrics audio to the music playback device determined to be usable;
    A reproduction system comprising:
  3.  サーバと端末装置とを備える再生システムであって、
     前記サーバは、
     利用可能な音楽再生装置を特定する利用許可情報を記憶する記憶部と、
     前記端末装置から受信した音楽再生装置の識別情報と前記利用許可情報とに基づいて、前記音楽再生装置が利用可能か否かを判定する判定手段と、
     楽曲の楽曲データ及び当該楽曲の歌詞データを取得する取得手段と、
     前記歌詞データに基づいて、歌詞音声データを生成する歌詞音声データ生成手段と、
     前記楽曲中の歌詞部分に先行するように、前記歌詞音声データを前記楽曲データに加算して歌詞音声付楽曲データを生成する歌詞音声付楽曲データ生成手段と、
     前記判定手段により前記音楽再生装置が利用可能と判定されたときに、前記端末装置から受信した再生楽曲を指定する情報に対応する前記歌詞音声付楽曲データを前記端末装置へ送信する送信手段と、を備え、
     前記端末装置は、
     当該端末装置と接続された音楽再生装置から当該音楽再生装置の識別情報を取得し、前記サーバへ送信する識別情報取得手段と、
     再生すべき楽曲である再生楽曲を選択するための入力手段と、
     選択された再生楽曲を指定する情報を前記サーバに送信し、前記音楽再生装置が利用可能と判定されたときに当該再生楽曲に対応する歌詞音声付楽曲データを前記サーバから受信する第1の通信手段と、
     前記歌詞音声付楽曲データを、利用可能と判定された前記音楽再生装置に送信する第2の通信手段と、
     を備えることを特徴とする再生システム。
    A playback system comprising a server and a terminal device,
    The server
    A storage unit for storing use permission information for specifying an available music playback device;
    Determination means for determining whether or not the music playback device can be used based on the identification information of the music playback device received from the terminal device and the use permission information;
    Obtaining means for obtaining music data of the music and lyrics data of the music;
    Lyrics audio data generating means for generating lyrics audio data based on the lyrics data;
    Song data with lyric audio generating means for generating song data with lyric audio by adding the lyric audio data to the song data so as to precede the lyric part in the song;
    Transmitting means for transmitting, to the terminal device, the song data with lyrics voice corresponding to the information specifying the reproduced music received from the terminal device when the determining device determines that the music reproducing device is usable; With
    The terminal device
    Identification information acquisition means for acquiring identification information of the music playback device from the music playback device connected to the terminal device, and transmitting the identification information to the server;
    An input means for selecting a reproduction music that is a music to be reproduced;
    1st communication which transmits the information which designates the selected reproduction | regeneration music to the said server, and receives the music data with lyrics audio | voice corresponding to the said reproduction | regeneration music from the said server, when it determines with the said music reproduction apparatus being usable. Means,
    Second communication means for transmitting the song data with lyrics audio to the music playback device determined to be usable;
    A reproduction system comprising:
  4.  前記第2の通信手段は、前記歌詞音声付楽曲データを前記音楽再生装置に送信する前に、当該音楽再生装置の識別情報を受信し、受信した識別情報に基づき当該音楽再生装置が前記サーバで利用可能と判定された音楽再生装置であると再判定されたときは前記歌詞音声付楽曲データを前記識別情報を受信した音楽再生装置に送信し、受信した識別情報に基づき当該音楽再生装置が前記サーバで利用可能と判定された音楽再生装置でないと再判定されたときは前記歌詞音声付楽曲データを前記識別情報を受信した音楽再生装置に送信しないことを特徴とする請求項2又は3に記載の再生システム。 The second communication means receives the identification information of the music playback device before transmitting the music data with lyrics voice to the music playback device, and the music playback device is the server based on the received identification information. When it is re-determined that the music playback device is determined to be usable, the music data with lyrics voice is transmitted to the music playback device that has received the identification information, and the music playback device is configured to receive the identification information based on the received identification information. 4. The music data with lyrics voice is not transmitted to the music playback device that has received the identification information when it is determined again that the music playback device is not usable by the server. Playback system.
  5.  前記記憶部は、前記利用許可情報として、利用可能な音楽再生装置の識別情報を記憶しており、
     前記判定手段は、前記端末装置から受信した識別情報と同一の識別情報が前記記憶部に記憶されている場合に、前記音楽再生装置を利用可能と判定することを特徴とする請求項2乃至4のいずれか一項に記載の再生システム。
    The storage unit stores identification information of an available music playback device as the use permission information,
    The determination unit determines that the music playback device can be used when the same identification information as the identification information received from the terminal device is stored in the storage unit. The reproduction | regeneration system as described in any one of.
  6.  前記記憶部は、前記利用許可情報として、所定の利用許可コードを記憶しており、
     前記判定手段は、前記端末装置から受信した識別情報に前記利用許可コードが含まれている場合に、前記音楽再生装置を利用可能と判定することを特徴とする請求項2乃至4のいずれか一項に記載の再生システム。
    The storage unit stores a predetermined use permission code as the use permission information,
    The determination unit determines that the music playback device can be used when the use permission code is included in the identification information received from the terminal device. The reproduction system according to item.
  7.  サーバと通信可能な端末装置であって、
     当該端末装置と接続された再生装置から当該再生装置の識別情報を取得し、前記サーバへ送信する識別情報取得手段と、
     再生すべきコンテンツを指定する情報を前記サーバに送信し、前記再生装置が利用可能と判定されたときに当該コンテンツに対応するコンテンツデータを前記サーバから受信する第1の通信手段と、
     前記コンテンツデータを、利用可能と判定された前記再生装置に送信する第2の通信手段と、
     を備えることを特徴とする端末装置。
    A terminal device capable of communicating with a server,
    Identification information acquisition means for acquiring identification information of the reproduction device from a reproduction device connected to the terminal device, and transmitting the identification information to the server;
    First communication means for transmitting information specifying content to be played back to the server and receiving content data corresponding to the content from the server when it is determined that the playback device is usable;
    Second communication means for transmitting the content data to the playback device determined to be usable;
    A terminal device comprising:
  8.  サーバと通信可能な端末装置によって実行されるコンテンツ通信方法であって、
     当該端末装置と接続された再生装置から当該再生装置の識別情報を取得し、前記サーバへ送信する識別情報取得工程と、
     再生すべきコンテンツを指定する情報を前記サーバに送信し、前記再生装置が利用可能と判定されたときに当該コンテンツに対応するコンテンツデータを前記サーバから受信する第1の通信工程と、
     前記コンテンツデータを、利用可能と判定された前記再生装置に送信する第2の通信工程と、
     を備えることを特徴とするコンテンツ通信方法。
    A content communication method executed by a terminal device capable of communicating with a server,
    An identification information acquisition step of acquiring identification information of the reproduction device from the reproduction device connected to the terminal device, and transmitting the identification information to the server;
    A first communication step of transmitting information specifying content to be played back to the server and receiving content data corresponding to the content from the server when it is determined that the playback device is usable;
    A second communication step of transmitting the content data to the playback device determined to be usable;
    A content communication method comprising:
  9.  サーバと通信可能な端末装置によって実行されるプログラムであって、
     当該端末装置と接続された再生装置から当該再生装置の識別情報を取得し、前記サーバへ送信する識別情報取得手段、
     再生すべきコンテンツを指定する情報を前記サーバに送信し、前記再生装置が利用可能と判定されたときに当該コンテンツに対応するコンテンツデータを前記サーバから受信する第1の通信手段、
     前記コンテンツデータを、利用可能と判定された前記再生装置に送信する第2の通信手段、
     として前記端末装置を機能させることを特徴とするプログラム。
    A program executed by a terminal device capable of communicating with a server,
    Identification information acquisition means for acquiring identification information of the reproduction device from the reproduction device connected to the terminal device, and transmitting the identification information to the server;
    First communication means for transmitting information specifying content to be reproduced to the server and receiving content data corresponding to the content from the server when it is determined that the reproduction apparatus is usable;
    A second communication means for transmitting the content data to the playback device determined to be usable;
    A program for causing the terminal device to function as:
  10.  サーバと端末装置と通信可能なサーバであって、
     利用可能な再生装置を特定する利用許可情報を記憶する記憶部と、
     前記端末装置から当該端末装置と接続された再生装置の識別情報と、コンテンツを指定する情報を受信する受信手段と、
     前記識別情報と前記利用許可情報とに基づいて、前記再生装置が利用可能か否かを判定する判定手段と、
     前記判定手段により前記再生装置が利用可能と判定されたときに、前記コンテンツを指定する情報に対応するコンテンツデータを前記端末装置へ送信する送信手段と、を備えることを特徴とするサーバ。
    A server capable of communicating with the server and the terminal device,
    A storage unit for storing use permission information for specifying an available playback device;
    Receiving means for receiving, from the terminal device, identification information of a playback device connected to the terminal device, and information specifying the content;
    Determining means for determining whether or not the playback device is usable based on the identification information and the use permission information;
    A transmission unit configured to transmit content data corresponding to information specifying the content to the terminal device when the determination unit determines that the reproduction apparatus is usable;
PCT/JP2015/059967 2015-03-30 2015-03-30 Communication system, playback system, terminal device, server, content communication method, and program WO2016157377A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2015/059967 WO2016157377A1 (en) 2015-03-30 2015-03-30 Communication system, playback system, terminal device, server, content communication method, and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2015/059967 WO2016157377A1 (en) 2015-03-30 2015-03-30 Communication system, playback system, terminal device, server, content communication method, and program

Publications (1)

Publication Number Publication Date
WO2016157377A1 true WO2016157377A1 (en) 2016-10-06

Family

ID=57006593

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2015/059967 WO2016157377A1 (en) 2015-03-30 2015-03-30 Communication system, playback system, terminal device, server, content communication method, and program

Country Status (1)

Country Link
WO (1) WO2016157377A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3910622A1 (en) * 2020-05-15 2021-11-17 Keumyoung Entertainment Co., Ltd Sound source file structure, recording medium recording the same, and method of producing sound source file

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002244677A (en) * 2001-02-21 2002-08-30 Alpine Electronics Inc Audio reproducing device
JP2005037846A (en) * 2003-07-18 2005-02-10 Xing Inc Information setting device and method for music reproducing device
JP2005064777A (en) * 2003-08-11 2005-03-10 Alpine Electronics Inc Audiovisual reproduction system, audiovisual apparatus, and audiovisual reproduction method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002244677A (en) * 2001-02-21 2002-08-30 Alpine Electronics Inc Audio reproducing device
JP2005037846A (en) * 2003-07-18 2005-02-10 Xing Inc Information setting device and method for music reproducing device
JP2005064777A (en) * 2003-08-11 2005-03-10 Alpine Electronics Inc Audiovisual reproduction system, audiovisual apparatus, and audiovisual reproduction method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3910622A1 (en) * 2020-05-15 2021-11-17 Keumyoung Entertainment Co., Ltd Sound source file structure, recording medium recording the same, and method of producing sound source file
US11551717B2 (en) 2020-05-15 2023-01-10 Keumyoung Entertainment Co., Ltd Sound source file structure, recording medium recording the same, and method of producing sound source file

Similar Documents

Publication Publication Date Title
JP2006195385A (en) Device and program for music reproduction
JP6691737B2 (en) Lyrics sound output device, lyrics sound output method, and program
JP4916005B2 (en) Karaoke system
JP2007164497A (en) Preference estimation apparatus and controller
WO2016157377A1 (en) Communication system, playback system, terminal device, server, content communication method, and program
WO2016135921A1 (en) Vehicle-mounted music reproduction device, music reproduction method, and program
JP2012163609A (en) Music selection device
JP4829184B2 (en) In-vehicle device and voice recognition method
JP6810773B2 (en) Playback device, playback method, and program
JP6944357B2 (en) Communication karaoke system
JP2016157088A (en) Music reproduction system, terminal device, music data providing method, and program
JP2016157087A (en) Music reproduction system, server, music data providing method and program
JP2016157082A (en) Reproduction device, reproduction method, and program
JP2016157084A (en) Reproduction device, reproduction method, and program
JP2016188920A (en) Terminal device, server, singing data generation method, and program
WO2016135920A1 (en) Reproduction device, reproduction method, and program
JP2023024738A (en) Reproduction device, reproduction method, and program
JP2005135519A (en) Music reproducing unit
JP2019219675A (en) Sound signal output device, sound signal output method, and program
JP2016158221A (en) Audio signal output device, audio signal output method and program
JP2016157085A (en) Reproduction device, reproduction method, and program
JP2016157083A (en) Reproduction device, reproduction method, and program
JP2005037846A (en) Information setting device and method for music reproducing device
JP6798561B2 (en) Signal processing equipment, signal processing methods and programs
JP5660408B1 (en) Posted music performance system and posted music performance method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15887530

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

NENP Non-entry into the national phase

Ref country code: JP

122 Ep: pct application non-entry in european phase

Ref document number: 15887530

Country of ref document: EP

Kind code of ref document: A1