WO2016157377A1

WO2016157377A1 - Communication system, playback system, terminal device, server, content communication method, and program

Info

Publication number: WO2016157377A1
Application number: PCT/JP2015/059967
Authority: WO
Inventors: 克仁石岡; 啓太郎菅原
Original assignee: パイオニア株式会社
Priority date: 2015-03-30
Filing date: 2015-03-30
Publication date: 2016-10-06

Abstract

According to the present invention, a server stores usage permission information indicating a usable music playback device. When a terminal device acquires identification information for a music playback device from said music playback device and transmits said information to the server, the server determines, on the basis of the identification information for the music playback device received from the terminal device and of the usage permission information, whether or not the music playback device is usable. When the music playback device is determined to be usable, the server acquires song lyric data for a musical piece and transmits said data to the terminal device. In this way, song lyric data is provided from the server only to a terminal device that operates together with a usable music playback device, and said data can be played back together with musical piece data.

Description

COMMUNICATION SYSTEM, REPRODUCTION SYSTEM, TERMINAL DEVICE, SERVER, CONTENT COMMUNICATION METHOD, AND PROGRAM

The present invention relates to a method of outputting lyrics information as music is played.

A karaoke apparatus that synthesizes and outputs lyrics data prior to a karaoke performance is known (for example, Patent Documents 1 and 2).

Japanese Patent Laid-Open No. 4-67467 Japanese Patent Laid-Open No. 10-63274

In the case of a karaoke device, lyrics are not included in the music to be played back, so that the lyric sound output by the prior art does not become difficult to hear. However, if you are listening to normal music instead of karaoke, if you output the lyrics by voice using the technique of the prior art, the output lyrics will overlap with the lyrics included in the original music. It may be difficult to hear. Also, for example, when listening to music while driving a vehicle, the lyrics voice output by the prior art method may overlap with the voice message of the route guidance by the in-vehicle navigation device, and it may be difficult to hear. .

The above is one example of problems to be solved by the present invention. An object of the present invention is to provide an easy-to-listen lyric sound for a user to sing a song while playing music including lyrics.

The invention according to claim 1 is a communication system including a server and a terminal device, wherein the server receives from the terminal device a storage unit that stores use permission information that identifies a usable playback device. A determination unit configured to determine whether or not the reproduction device is usable based on the identification information of the reproduction device and the use permission information; and when the determination unit determines that the reproduction device is usable, the terminal Transmitting means for transmitting content data corresponding to information specifying content received from the device to the terminal device, wherein the terminal device obtains identification information of the playback device from the playback device connected to the terminal device. The identification information acquisition means for acquiring and transmitting to the server, the information for specifying the content to be reproduced is transmitted to the server, and the reproduction apparatus is determined to be usable. A first communication unit that receives content data corresponding to the content from the server, and a second communication unit that transmits the content data to the playback device determined to be usable. To do.

The invention according to claim 2 is a playback system including a server and a terminal device, wherein the server receives from the terminal device a storage unit that stores use permission information indicating an available music playback device. Based on the identification information of the music playback device and the use permission information, a determination unit that determines whether or not the music playback device can be used, an acquisition unit that acquires lyric data of music, and the determination unit Transmission means for transmitting to the terminal device the lyric data corresponding to the information specifying the reproduced music received from the terminal device when it is determined that the reproducing device is usable; Acquires identification information of the music playback device from the music playback device connected to the terminal device, selects identification information acquisition means for transmitting to the server, and selects a playback music that is to be played back Input means, music data acquisition means for acquiring music data of the selected playback music, and information specifying the selected playback music is transmitted to the server, and it is determined that the music playback device is usable Sometimes the first communication means for receiving the lyrics data corresponding to the reproduced music from the server, the lyrics voice data generating means for generating the lyrics voice data based on the lyrics data, and the lyrics part in the music As described above, the lyric sound-added music data generating means for adding the lyric sound data to the music data to generate the lyric sound-added music data, and the music that has been determined to be usable. And a second communication means for transmitting to the playback device.

The invention according to claim 3 is a playback system including a server and a terminal device, wherein the server receives from the terminal device a storage unit that stores use permission information for specifying an available music playback device. Determination means for determining whether or not the music playback apparatus is usable based on the identification information of the music playback apparatus and the use permission information, and acquisition means for acquiring music data of the music and lyrics data of the music Lyric sound data generating means for generating lyric sound data based on the lyric data, and the lyric sound data is added to the tune data so as to precede the lyric part in the tune, and the tune data with lyric sound is added. Lyric sound-added music data generating means for generating the music data, and when the determination means determines that the music playback apparatus is usable, the playback music received from the terminal device is designated. Transmitting means for transmitting the song data with lyrics voice corresponding to the information to be transmitted to the terminal device, wherein the terminal device acquires identification information of the music playback device from the music playback device connected to the terminal device And transmitting the identification information acquisition means to be transmitted to the server, input means for selecting a reproduction music that is a music to be reproduced, and information specifying the selected reproduction music to the server, and the music reproduction device. The first communication means for receiving, from the server, music data with lyrics voice corresponding to the reproduced music when it is determined that the music data can be used, and the music playback for which the music data with lyrics voice is determined to be usable And a second communication means for transmitting to the apparatus.

The invention according to claim 7 is a terminal device capable of communicating with the server, and obtains identification information of the reproduction device from the reproduction device connected to the terminal device, and transmits the identification information to the server. A first communication means for transmitting information specifying content to be reproduced to the server, and receiving content data corresponding to the content from the server when the reproduction device is determined to be usable; And a second communication means for transmitting data to the playback device determined to be usable.

The invention according to claim 8 is a content communication method executed by a terminal device capable of communicating with a server, acquires identification information of the playback device from a playback device connected to the terminal device, and sends the identification information to the server. The identification information acquisition step to be transmitted, information specifying the content to be played back is sent to the server, and content data corresponding to the content is received from the server when the playback device is determined to be usable. And a second communication step of transmitting the content data to the playback device determined to be usable.

The invention according to claim 9 is a program executed by a terminal device capable of communicating with a server, acquires identification information of the playback device from a playback device connected to the terminal device, and transmits the identification information to the server Identification information acquisition means, first communication means for transmitting information specifying content to be played back to the server and receiving content data corresponding to the content from the server when it is determined that the playback device is usable The terminal device is caused to function as a second communication means for transmitting the content data to the playback device determined to be usable.

The invention according to claim 10 is a server capable of communicating with the server and the terminal device, and is connected to the storage device storing use permission information for specifying a usable playback device, and the terminal device is connected to the terminal device. Receiving means for receiving identification information of the playback apparatus, information specifying the content, determination means for determining whether or not the playback apparatus can be used based on the identification information and the usage permission information, A transmission unit configured to transmit content data corresponding to information specifying the content to the terminal device when the determination unit determines that the reproduction apparatus is usable.

It is a figure which shows the concept of assist vocal. It is a flowchart of an assist vocal process. It is a flowchart of a speech information generation process. An overview of the speech information generation process is shown. An example of lyric blocking is shown. An example of a speech insertion method is shown. An example of speech enhancement processing is shown. The structure which concerns on the other example of a speech emphasis process is shown. The structure which concerns on the other example of a speech emphasis process is shown. It is a block diagram which shows the whole structure of a music reproduction system. It is a block diagram which shows the internal structural example of a terminal device. It is a flowchart of the assist vocal process by the music reproduction system of 1st Example. It is a flowchart of the assist vocal process by the music reproduction system of 2nd Example. It is a flowchart of the assist vocal process which reproduces | regenerates only speech. It is a figure explaining the identification method of the music currently reproduced | regenerated by the external source. Describes how to restrict the use of assist vocals. It is a flowchart of an availability check process. An environment for generating recording data for the choral function is shown. Shows how to create singing voice data It is a flowchart of the production | generation process of singing voice data. It is a flowchart of the production | generation process of singing voice data. It is a flowchart of a choral process. It is a flowchart of choral processing.

In another preferred embodiment of the present invention, in a communication system including a server and a terminal device, the server receives from the terminal device a storage unit that stores use permission information for specifying a usable playback device. A determination unit configured to determine whether or not the reproduction device is usable based on the identification information of the reproduction device and the use permission information; and when the determination unit determines that the reproduction device is usable, the terminal Transmitting means for transmitting content data corresponding to information specifying content received from the device to the terminal device, wherein the terminal device obtains identification information of the playback device from the playback device connected to the terminal device. ID information acquisition means for acquiring and transmitting to the server, and information for specifying the content to be reproduced is transmitted to the server, and the reproduction apparatus is determined to be usable A first communication unit that receives content data corresponding to the content from the server when transmitted, and a second communication unit that transmits the content data to the playback device determined to be usable. .

In the communication system described above, the server stores use permission information that identifies usable playback devices. The terminal device acquires identification information of the reproduction device from the reproduction device connected to the terminal device, transmits the identification information to the server, and transmits information specifying the content to be reproduced to the server. The server determines whether or not the playback device can be used based on the playback device identification information and use permission information received from the terminal device, and when the playback device is determined to be available, the server receives the playback device from the terminal device. Content data corresponding to information specifying the content is transmitted to the terminal device. The terminal device transmits the content data received from the server to the playback device determined to be usable. As a result, the content data is transmitted only to the playback device determined to be usable.

A preferred embodiment of the present invention is a playback system including a server and a terminal device, and the server receives from the terminal device a storage unit that stores use permission information indicating an available music playback device. Based on the identification information of the music playback device and the use permission information, a determination unit that determines whether or not the music playback device can be used, an acquisition unit that acquires lyric data of music, and the determination unit Transmission means for transmitting to the terminal device the lyric data corresponding to the information specifying the reproduced music received from the terminal device when it is determined that the reproducing device is usable; Acquires identification information of the music playback device from the music playback device connected to the terminal device, selects identification information acquisition means for transmitting to the server, and selects the playback music that is to be played back Input means, music data acquisition means for acquiring music data of the selected playback music, and information specifying the selected playback music is transmitted to the server, and the music playback device is determined to be usable Sometimes the first communication means for receiving the lyrics data corresponding to the reproduced music from the server, the lyrics voice data generating means for generating the lyrics voice data based on the lyrics data, and the lyrics part in the music As described above, the lyric sound-added music data generating means for adding the lyric sound data to the music data to generate the lyric sound-added music data, and the music that has been determined to be usable. Second communication means for transmitting to the playback device.

In the above playback system, the server stores use permission information indicating the music playback devices that can be used. The terminal device acquires identification information of the music playback device from the music playback device connected to the terminal device and transmits the identification information to the server. The server determines whether or not the music playback device can be used based on the music playback device identification information and use permission information received from the terminal device.

Also, in the terminal device, the playback music that is the music to be played back is selected by the user. The terminal device transmits information specifying the selected reproduction music piece to the server. When it is determined that the music playback device can be used, the server acquires lyrics data corresponding to information specifying the playback music received from the terminal device, and transmits the lyrics data to the terminal device. The terminal device receives lyric data corresponding to the reproduced music from the server when it is determined that the music reproducing device can be used, and generates lyric audio data based on the lyric data. Then, the terminal device adds the lyrics voice data to the song data to generate the song data with the lyrics voice so as to precede the lyrics portion in the song, and transmits it to the music playback device determined to be usable for playback. Let In this way, the lyrics data is provided from the server only to the terminal device operating together with the available music playback device, and can be played back together with the music data.

Another preferred embodiment of the present invention is a playback system including a server and a terminal device, wherein the server stores use permission information for specifying an available music playback device, and the terminal device. Based on the identification information of the music playback device received from the above and the use permission information, the determination means for determining whether or not the music playback device can be used, and the acquisition of the music data of the music and the lyrics data of the music Means, lyric sound data generating means for generating lyric sound data based on the lyric data, and adding the lyric sound data to the music data so as to precede the lyric part in the music Music data generating means with lyric sound for generating music data, and playback music received from the terminal device when the determination means determines that the music playback apparatus is usable; Transmitting means for transmitting the song data with lyrics voice corresponding to the information for designating to the terminal device, the terminal device from the music playback device connected to the terminal device identification information of the music playback device ID information acquiring means for transmitting the information to the server, input means for selecting the reproduced music that is the music to be reproduced, and information specifying the selected reproduced music are transmitted to the server, and the music First communication means for receiving, from the server, music data with lyrics audio corresponding to the reproduced music when it is determined that the playback device is usable, and the music data with lyrics audio determined to be available Second communication means for transmitting to the music playback device.

In the above playback system, the server stores use permission information for specifying an available music playback device. The terminal device acquires identification information of the music playback device from the music playback device connected to the terminal device, and transmits the identification information to the server. The server determines whether or not the music playback device can be used based on the music playback device identification information and use permission information received from the terminal device.

In addition, the server acquires the song data of the song and the lyrics data of the song, generates the lyrics voice data based on the lyrics data, and adds the lyrics voice data to the song data so as to precede the lyrics portion in the song. To generate song data with lyrics audio.

The terminal device receives selection of a reproduction music that is a music to be reproduced from the user, and transmits information specifying the selected reproduction music to the server. When it is determined that the music playback device can be used, the server transmits music data with lyrics audio corresponding to information specifying the playback music received from the terminal device to the terminal device. When it is determined that the music playback device is usable, the terminal device receives music data with lyrics audio corresponding to the playback music from the server, and transmits the music data to the music playback device determined to be usable for playback. In this way, the lyrics data is provided from the server only to the terminal device operating together with the available music playback device, and can be played back together with the music data.

In one aspect of the above playback system, the second communication means receives the identification information of the music playback device before transmitting the song data with lyrics sound to the music playback device, and the received identification information When the music playback device is determined again as a music playback device that is determined to be usable by the server, the music data with lyrics voice is transmitted to the music playback device that has received the identification information, and the received identification is received. When it is determined again that the music playback device is not a music playback device that is determined to be usable by the server based on the information, the music data with lyrics voice is not transmitted to the music playback device that has received the identification information. In this aspect, the terminal device re-determines whether or not the music playback device can be used before transmitting the song data with lyrics voice to the music playback device.

In another aspect of the music reproduction system, the storage unit stores identification information of an available music reproduction device as the use permission information, and the determination unit receives the identification received from the terminal device. When the same identification information as the information is stored in the storage unit, it is determined that the music playback device can be used.

In another aspect of the music reproduction system, the storage unit stores a predetermined use permission code as the use permission information, and the determination unit uses the identification information received from the terminal device as the use information. When the permission code is included, it is determined that the music playback device can be used.

In another preferred embodiment of the present invention, a terminal device capable of communicating with a server acquires identification information of the playback device from a playback device connected to the terminal device, and transmits identification information to the server. A first communication means for transmitting information specifying content to be reproduced to the server, and receiving content data corresponding to the content from the server when the reproduction device is determined to be usable; Second communication means for transmitting data to the playback device determined to be usable.

The above terminal device acquires the identification information of the playback device from the playback device connected to the terminal device, and transmits it to the server. In addition, information specifying content to be reproduced is transmitted to the server. Then, when it is determined that the playback device is usable, the terminal device receives content data corresponding to the content from the server, and transmits the content data to the playback device determined to be usable. As a result, the content data is transmitted only to the playback device determined to be usable.

In another preferred embodiment of the present invention, a content communication method executed by a terminal device capable of communicating with a server acquires identification information of the playback device from a playback device connected to the terminal device, and sends the identification information to the server. The identification information acquisition step to be transmitted, information specifying the content to be played back is sent to the server, and content data corresponding to the content is received from the server when the playback device is determined to be usable. And a second communication step of transmitting the content data to the playback device determined to be usable. By this method, the content data is transmitted only to the playback device determined to be usable.

In another preferred embodiment of the present invention, a program executed by a terminal device capable of communicating with a server acquires identification information of the playback device from a playback device connected to the terminal device, and transmits the identification information to the server. Identification information acquisition means, first communication means for transmitting information specifying content to be played back to the server and receiving content data corresponding to the content from the server when it is determined that the playback device is usable The terminal device is caused to function as second communication means for transmitting the content data to the playback device determined to be usable. By executing this program, the content data is transmitted only to the playback device determined to be usable.

In another preferred embodiment of the present invention, a server capable of communicating with a server and a terminal device is connected to the terminal device from the terminal device and a storage unit that stores use permission information for specifying a usable playback device. Receiving means for receiving identification information of the playback apparatus, information specifying the content, determination means for determining whether or not the playback apparatus can be used based on the identification information and the usage permission information, Transmission means for transmitting content data corresponding to information designating the content to the terminal device when it is determined by the determination means that the playback device is usable. By this server, the content data is transmitted only to the playback device determined to be usable.

Hereinafter, preferred embodiments of the present invention will be described with reference to the drawings.

[1] Assist Vocal [1.1] Concept of Assist Vocal When a user who is driving a vehicle plays and listens to music in the car, he may want to sing the song he is listening to. However, since the lyrics information cannot be seen while driving, the user cannot sing unless the lyrics of the song are stored.

In this embodiment, when a song containing lyrics is played, the lyrics contained in the song are output as an audio signal to teach the user. Specifically, when playing a song stored in the memory of the terminal device, the lyrics included in the song are output as audio before the lyrics are played in the song. Tell the user. Thereby, the user can sing the music being reproduced even during driving. Also, users other than the driver can sing songs without looking at the lyrics collection.

In this way, the function of outputting the contents of the lyrics and transmitting them to the user prior to the timing when the lyrics are reproduced in the music is called “assist vocal”. In this embodiment, it is assumed that the music to be played is not a karaoke but a normal music including lyrics.

Figure 1 shows the concept of assist vocals. FIG. 1 schematically shows one piece of music. The horizontal axis in FIG. 1 indicates time. One piece of music includes a lyrics portion divided into a plurality of blocks. The part of the lyrics included in the music to be played is called “vocal”. Also, in the music, the part other than vocals is called “interlude”. Therefore, usually one piece of music is composed of a plurality of interludes and a plurality of vocals.

In the example of FIG. 1, the music is composed of three vocals 1 to 3 and a plurality of interludes. It is assumed that the content (lyrics) of the vocal 1 is “Aiueo”, the content of the vocal 2 is “Kakikukeko”, and the content of the vocal 3 is “Sashisuseso”.

In a situation where such a music is being played back, in the present embodiment, the lyrics “Aiueo” corresponding to the vocal 1 are output as audio prior to the timing at which the vocal 1 in the music is played back. In the present specification, the lyric sound output by the assist vocal is called “speech” and is distinguished from “vocal” included in the music.

In the example of FIG. 1, speech 1 corresponding to vocal 1 is output prior to vocal 1. Similarly, speech 2 is output prior to vocal 2, and speech 3 is output prior to vocal 3.

The speech outputs only the vocal lyrics included in the song as an audio signal, and basically does not include elements such as pitch and rhythm. Also, as will be described later, the speech is basically inserted into the interlude before the corresponding vocal, so the length is adjusted as necessary, and it is usually played back as a vocal during the playback of a song. It is also a short time. In a typical example, the speech is spoken speech of the corresponding vocal lyrics.

[1.2] Assist Vocal Processing Next, assist vocal processing for outputting speech will be described. FIG. 2 is a flowchart of the assist vocal process. This process is executed by a terminal device mounted on the vehicle, typically a mobile terminal such as a smartphone, and the details thereof will be described later. In the following description, it is assumed that the terminal device executes processing.

First, the terminal device determines whether or not the assist vocal is on (step S1). Here, the assist vocal may be turned on / off manually by the user or automatically. When performing manually, when a user wants to reproduce speech by assist vocal, the user operates a predetermined button or the like to turn on the assist vocal, and the terminal device detects this. On the other hand, when performing automatically, a terminal device judges a user's voice, for example using a microphone etc., and when a user is singing a song or performing the act according to a song, an assist vocal is automatically carried out. Set to on. The assist vocal automatic setting method will be described later.

If the assist vocal is not set to ON (step S1: No), the process ends. On the other hand, when the assist vocal is set to ON (step S1: Yes), the terminal device specifies the music being reproduced (step S2). In this case, the music played in the car is a music stored in the terminal device by downloading from the server, a music stored in a storage medium such as a CD or a memory of the vehicle-mounted device, a radio Music that is being played from When the music stored in the terminal device is being reproduced, the terminal device can easily specify the music being reproduced. On the other hand, when music stored on a storage medium such as a CD is being played or when music is being played from the radio, the terminal device collects the music being played from the speaker in the car with a microphone. Then, the audio data is transmitted to an external music search server. The music search server stores a large number of pieces of music data as a database, specifies music that matches the audio data received from the terminal device, and indicates the music (for example, music name, artist name, etc. (Referred to as “music identification information”) to the terminal device. In this way, the terminal device acquires the music specifying information of the music currently being played.

Thus, when the music being played is specified, the terminal device executes a speech information generation process (step S3). FIG. 3 is a flowchart of the speech information generation process. FIG. 4 shows an outline of the speech information generation process.

In FIG. 3, the terminal device acquires the lyrics data of the music specified in step S2 from an external server or the like (step S31). Here, the “lyric data” is information that defines what lyrics are reproduced at which timing in the music, specifically, lyrics text data indicating the lyrics included in the music, This is information in which the reproduction time data indicating the reproduction time (the elapsed time from the start time of the song) is associated with the reproduction time data.

Next, the terminal device acquires music analysis data (step S32). The music analysis data is information indicating musical features such as beat positions and bar positions in the music, and is generated based on the audio data of the reproduced music. Specifically, the terminal device has a built-in music analysis application, collects music played from a vehicle speaker with a microphone, acquires audio data, and analyzes the audio data to obtain beat positions. Acquire music analysis data such as. Note that the music analysis data may be acquired using an external music analysis device or server instead of incorporating the music analysis application in the terminal device.

Next, the terminal device performs lyrics block formation (step S33). Lyrics blocking is a process of blocking lyrics text data included in the lyrics data acquired in step S31, and one block corresponds to one speech. That is, lyric blocking is a process of dividing lyric text data into speech units.

In the example of FIG. 4, the terminal device has acquired “Aiueokiki Kekoshisashisoseso” as the lyric text data, and the terminal device has obtained three blocks “Aiueo”, “Kakikukeko”, and “Sashisetsuso”. To generate block lyrics data.

FIG. 5 shows an example of lyrics block. FIG. 5A shows a first method. In this method, the interval between the interludes included in the music is set as one block. The “interlude” is a part other than “vocal” in the music. Specifically, when the length It of a section other than vocal (non-vocal section) is longer than a predetermined length t1, the terminal apparatus determines that the section is an interlude.

However, there are exceptional cases where multiple blocks are combined into one block due to the length of the interlude. As in the example shown in FIG. 5B, the length It2 of the interlude 2 immediately before the length Vt3 of the vocal 3 is very short (It2 <α1 · Vt3; α1 is an arbitrary coefficient). It is difficult to output vocal 3 speech during interlude 2. In such a case, if the length It1 of the previous interlude 1 is longer than a predetermined length, the terminal device sets the vocal 2 and vocal 3 as one block. Thus, speech corresponding to vocal 2 and vocal 3 is made in interlude 1.

Fig. 5 (C) shows the second method. In this method, the terminal device determines each block based on a break included in the lyrics data. That is, if the lyric text data included in the lyric data includes delimiter information in advance, the terminal device can block the lyric text data according to the delimiter.

Next, the terminal device performs lyrics speech (step S34). The block lyric data obtained by lyric blocking is text data indicating lyrics, and lyric speech is a process of converting block lyric data into audio data. Specifically, the terminal device incorporates text-to-speech (TTS: TextToSpeech) software, and converts each block lyrics data obtained in step S33 into speech data. As a result, as shown in FIG. 4, speech data 1 to 3 that are audio data are generated from each block lyrics data. Instead of incorporating TTS software in the terminal device, TTS conversion by an external server or the like may be used.

Next, the terminal device changes the speech length (step S35). The speech length change is a process for shortening the time length of each speech obtained by lyric speech so that it can be reproduced in a short time. As already described, each speech is reproduced in an interlude preceding the corresponding vocal. However, since there is a limitation on the time length of the interlude, it is necessary to reproduce the speech by shortening it. For this reason, the speech length is changed.

Basically, the playback time of each speech is shortened (the playback speed is increased) within a range that can be heard by humans. For example, when the time length of each speech obtained in step S34 (referred to as “original speech length”) is “St” and the speech length conversion coefficient is “α2”, the speech length is changed by changing the speech length. The length “Stv” is
Stv = St · α2 (α2 <1.0) (1)
Given in. For example, if α2 = 0.7, each speech is reproduced at a rate 30% higher than the original by changing the speech length.

In addition to the batch change as described above, the playback time may be further shortened according to the duration of the interlude corresponding to each speech. In this case, even for speech with the same number of characters or words with the same lyrics, the playback time varies depending on the position in the song (the length of the preceding interlude).

Next, the terminal device calculates the speech insertion timing (step S36). The terminal device inserts speech corresponding to a certain vocal prior to the playback timing of the vocal. In the example shown in FIG. 4, the speech 1 corresponding to the vocal 1 is inserted before the reproduction timing of the vocal. Similarly, the speech 2 corresponding to the vocal 2 is inserted before the reproduction timing of the vocal 2, and the speech 3 corresponding to the vocal 3 is inserted before the reproduction timing of the vocal 3.

A specific example of a method for inserting speech is shown in FIG. FIG. 6 shows an example of the timing at which the speech 2 corresponding to the vocal 2 is inserted.

In Method 1, the speech ends a certain time before the start timing of the corresponding vocal. Specifically, as shown in FIG. 6, the speech 2 is inserted so as to end a certain time T2 before the reproduction start timing of the vocal 2. That is, the speech 2 ends a predetermined time T2 before the start of the reproduction of the vocal 2. In this case, the reproduction start timing of the speech 2 is determined according to the length of the speech 2. In Method 1, since a certain time is secured from the end of the speech reproduction until the corresponding vocal is reproduced, the user can sing the vocal portion with a margin.

In Method 2, the speech end timing is matched with the beat position of the music. Specifically, in the example of FIG. 6, the speech 2 is inserted so as to end N beats before the playback start timing of the vocal 2 (N is an arbitrary integer; N = 1 in this example). In this case, the reproduction start timing of the speech 2 is determined according to the length of the speech 2. The position of the beat of the music is acquired from the music analysis data described above.

In Method 3, both the speech playback start timing and playback end timing are matched with the beat position of the music. Specifically, in the example of FIG. 6, the playback start timing and playback end timing of the speech 2 are both made coincident with the third beat of the four beats.

As in

methods

2 and 3, if both the speech end timing or the start / end timing coincide with the beat position of the music, the speech is linked to the music, so that the user can easily sing the music.

As described above, the terminal device determines the speech insertion timing. Specifically, for each speech, the playback start timing and playback end timing are defined by the elapsed time from the beginning of the music. The playback start timing and playback end timing of each speech is stored as part of the speech information. That is, the speech information includes an audio signal corresponding to each speech (hereinafter also referred to as “speech signal”) and the reproduction start timing / reproduction end timing of each speech.

Next, the processing returns to the main routine shown in FIG. 2, and the terminal device acquires the current playback position of the music being played back (step S4). Specifically, the terminal device acquires the current reproduction position by counting the elapsed time from the reproduction start time of the music being reproduced.

Next, the terminal device performs speech enhancement processing (step S5). The speech emphasis process is a process for distinguishing vocals included in music from speech and making them easy to hear, details of which will be described later.

Next, the terminal device reproduces the speech based on the reproduction start timing / reproduction end timing of each speech included in the speech information and the current reproduction position (step S6). Specifically, the speech reproduction is started at the speech reproduction start timing, and the speech reproduction is terminated at the speech reproduction end timing. As a result, the corresponding speech is reproduced prior to the vocal in the music.

Next, the terminal device determines whether or not the speech reproduction should be terminated (step S7). Examples of the case where speech reproduction should be terminated include a case where speech information is lost, a case where music reproduction itself is terminated, a case where assist vocals are turned off by a user operation, and the like. If the speech reproduction should not be terminated (step S7: No), the process returns to step S4 to continue the speech reproduction. On the other hand, if the speech reproduction should be terminated (step S7: Yes), the assist vocal process is terminated.

[1.3] Automatic Assist Vocal Setting Method Next, a method for automatically turning on the assist vocal in step S1 of the assist vocal process shown in FIG. 2 will be described.

As a basic method, the terminal device collects the voice uttered by the user with a microphone, and the user is singing (singing a song) according to the music or performing an action equivalent to singing. Assist vocal is automatically turned on when it is determined. For example, as a result of analyzing voice data collected by a microphone, if it is determined that a nose is sung, a piece is being sung, a humming, or the like, the assist vocal is turned on. On the other hand, when the voice data is not singing but is a conversation with a passenger, the assist vocal is not turned on. The assist vocal is not turned on even when the voice data includes a part singing a nose song, or when the voice data is mostly conversational.

It should be noted that whether or not the user's voice included in the voice data is a song can be determined based on the presence or absence of a rhythm or pitch included in the voice data. For example, if the rhythm is regular or the change in pitch is large, it is judged as singing. If the rhythm is irregular, it is judged as not singing (conversation) if the change in pitch is small. be able to. Further, by using the music analysis application described above, it may be determined that the song is a song when a beat or measure can be extracted from the audio data, and may not be a song when the song cannot be extracted. Further, by using the music search server or the music search function described above, it may be determined that the song is a song when the song can be specified from the audio data, and may not be determined when the song cannot be specified.

Also, the terminal device calculates the correlation between the collected audio data and the music being played, and when there is a correlation greater than a certain value, determines that the user is singing and turns on the assist vocal Also good. In addition, when the terminal device has already acquired the lyrics data of the song being played back, the user is singing when the correlation between the voice data collected by the microphone and the lyrics data is a certain value or more You may judge. Further, based on the lyric data, when the user's voice is output even at the interlude position of the music where the lyric should not exist, it may be determined that it is a conversation.

Also, the rhythm information collected by the microphone may be used. For example, if it is determined that the user is hitting the steering wheel with his / her hand or finger in accordance with the rhythm of the music, or if he / she is stepping on the floor with his / her foot, he / she performs an act similar to singing. The assist vocal may be turned on. In this case, the correlation between the rhythm collected by the microphone and the rhythm of the music being played back may be calculated, and the assist vocal may be turned on when the correlation is a certain value or more. In addition, the assist vocal may be turned on when the rhythm collected by the microphone repeats a certain rhythm without calculating the correlation with the rhythm of the music being played. .

Furthermore, the assist vocal may be turned on when the state of the user is photographed with a camera that photographs the inside of the vehicle and the user is shaking his / her head along with the music. In addition, it is possible to detect whether there is a passenger in the passenger seat or the rear seat with a camera that captures the interior of the vehicle, and even if the determination criterion of whether the user is singing or talking is changed depending on the presence of the passenger Good.

Also, in the above example, an example is described in which the assist vocal is turned on when it is determined that the user is singing. However, even if the user is singing, the user knows the lyrics and plays the assist vocal. If it is determined that it is not necessary to turn on the assist vocal, it is not necessary to turn on the assist vocal. Specifically, for example, when the correlation between the collected sound data and the music being played is a certain value or more and the correlation with the lyrics data is a certain value or more, the user knows the lyrics. Assist vocals are not turned on even when singing.

However, in this case, since the user may not understand the lyrics from the middle, the speech information may be generated and prepared for output. After that, if the correlation between the collected audio data and the music being played is less than a certain value, or if the correlation with the lyrics data is less than a certain value, it is determined that the user does not know the lyrics. , Output assist vocals.

In the above example, the assist vocal auto-on setting method has been described, but the assist vocal auto-off setting can also be performed. While the assist vocal is turned on, the user does not sing along with the song (does not sing a song) or acts similar to the song (singing a nose song, singing a piece of music, humming Assist Vocal may be automatically turned off when it is determined that the user has not performed the operation). Similarly, if a conversation is detected, the assist vocal may be automatically turned off, and if it is determined that the rhythm is not taken or the user is not shaking his / her head to the music, Vocals may be turned off automatically.

Moreover, in the above example, it is described that the assist vocal is automatically turned on or off based on whether or not the user is singing or acting in accordance with the singing. Depending on the configuration, automatic on setting or automatic off setting may be performed.

For example, for a user who wants to sing only the chorus part of the song, when playing the chorus part of the song, the assist vocal is automatically turned on, and when playing the part other than the chorus part of the song, The assist vocal may be automatically turned off. Conversely, for users who know the rust part and want to practice the part other than rust, when playing the part other than the rust part of the song, the assist vocal is automatically turned on, When playing the part, the assist vocal may be automatically turned off.

[1.4] Speech Enhancement Process Next, the speech enhancement process executed in step S5 of the assist vocal process shown in FIG. 2 will be described. The speech emphasis process is a method in which the user distinguishes between speech and vocal and makes it easy to hear, and shows the following several methods.

[1.4.1] Processing when speech and vocal overlap The speech is basically reproduced during the interlude immediately before the corresponding vocal, and preferably does not overlap with the vocal in time. For this purpose, the above-described speech length changing process (step S35) is performed. Depending on the length of the speech and the length of the interlude, the speech may not be completely reproduced during the interlude even if the speech length is shortened. That is, when the length of the speech is longer than the length of the interlude, the speech and vocal are partially overlapped and reproduced. As described above, any of the following processing may be performed instead of reproducing speech and vocals in an overlapping manner.

(1) Adjust the vocal level.

¡If speech and vocals overlap, there is a way to lower the vocal volume level. FIG. 7A shows a case where the rear portion of the speech and the head portion of the vocal overlap and an overlapping portion X occurs. In this case, the volume of the vocal is adjusted in the overlapping portion X. Specifically, the vocal volume is reduced to a level where speech can be heard, or zero. Thereby, in the overlapping part X, the reproduction of the speech is prioritized and the speech is easy to hear.

FIG. 7 (B) shows a case where the speech head portion and the rear portion of the previous vocal overlap, resulting in an overlap portion X. Also in this case, the vocal volume is adjusted in the overlapping portion X. Specifically, the vocal volume is reduced to a level where speech can be heard, or zero. Further, in the overlapping portion X, the volume level of the vocal may not be suddenly lowered, but the volume level may be gradually lowered by fading out the vocal. Thereby, in the overlapping part X, the reproduction of the speech is prioritized and the speech is easy to hear.

Specifically, the above level adjustment may be performed by lowering the volume level of the vocal component when the vocal component and the performance component such as a musical instrument are separated in the music signal. On the other hand, if the vocal part is synthesized with a performance part such as a musical instrument and the volume of the vocal alone cannot be adjusted, the volume level of the entire music signal may be reduced, In particular, the volume level may be lowered only for the component in the frequency band corresponding to vocal (human voice).

(2) Adjust the speech level.

¡If speech and vocals overlap, there is a method to lower the speech volume level. FIG. 7C shows a case where the rear part of the speech and the head part of the vocal overlap and an overlapping part X occurs. In this case, the speech volume is adjusted in the overlapping portion X. Specifically, the volume of speech is reduced or zero. Instead of suddenly reducing the speech volume, the speech may be faded out to gradually decrease the volume. In this case, speech cannot be heard at the overlapping portion X, but generally when listening to a song that the user knows to some extent, the entire lyrics are not remembered, but if the beginning of the lyrics is known, Often, you can sing with the memory of the lyrics. Therefore, as shown in FIG. 7C, if the head portion of the speech can be heard, the rear portion of the speech may be difficult to hear. This technique is effective in such a case.

[1.4.2] Processing to hear speech and vocals from different directions Humans have the ability to distinguish sounds coming from different directions at the same time (so-called cocktail party effect). Using this, it is possible to consider a method that enables the user to distinguish between speech and vocal. This method is executed regardless of whether speech and vocal overlap in time.

(1) Method of adjusting phase with left and right speakers FIG. 8A shows a configuration in which the phase of speech output from left and right speakers is inverted. The music signal of the left (L) channel is supplied to the adder 32, and the music signal of the right (R) channel is supplied to the adder 33. On the other hand, the speech signal is supplied to the adder 33 as it is, and its phase is inverted by the phase inverter 31 and supplied to the adder 32. The output of the adder 32 is supplied to the left speaker 30L, and the output of the adder 33 is supplied to the right speaker 30R.

According to this configuration, the sound image of a song including vocals is localized between the left and right speakers, whereas the sound image of a speech is localized around the user's ears, and the user distinguishes between the speech and the vocals in the song. It becomes easy. In the example of FIG. 8A, only the phase of the speech signal supplied to the left speaker 30L is inverted by the phase inverter 31, but only the phase of the speech signal supplied to the right speaker 30R is inverted. It may be reversed. Also, if there is a certain phase difference between the speech signals supplied to the left and right speakers, the sound image position of the speech and the sound image position of the music can be made different, so the speech signal supplied to one speaker is not necessarily There is no need to reverse (change 180 °). That is, it is only necessary to give a certain phase difference between the speech signal supplied to one speaker and the speech signal supplied to the other speaker.

In the above configuration, the phase inverter 31 is an example of the signal processing means of the present invention, and the

adders

32 and 33 are examples of the adding means and the output means of the present invention.

(2) Method for Controlling Sound Image Localization FIG. 8B shows a configuration in which a sound image of speech can be set at an arbitrary position. The music signal of the left (L) channel is supplied to the adder 32, and the music signal of the right (R) channel is supplied to the adder 33. On the other hand, the speech signal is supplied to the

adders

32 and 33 via the sound image localization control calculation unit 34 and the crosstalk cancellation unit 35. The sound image localization control calculation unit 34 convolves the transfer function between the target speaker position and the listening position (user's position) with the speech signal, and the crosstalk canceling unit 35 sets the speaker outputting the music and the listening position. A process for canceling the transfer function between them is performed. Accordingly, the sound image of the music can be localized between the left and

right speakers

30L and 30R, and the sound image of the speech can be localized at the target speaker position, so that the user can easily distinguish between the speech and the vocal.

In the above configuration, the sound image localization control calculation unit 34 and the crosstalk cancellation unit 35 are examples of signal processing means of the present invention, and the

adders

32 and 33 are examples of addition means and output means of the present invention.

(3) Method of using a headrest speaker When a headrest speaker is mounted on a vehicle seat in addition to a vehicle speaker, music including vocals can be output from the vehicle speaker, and speech can be output from the headrest speaker. . A configuration example in this case is shown in FIG.

The music signals of the left and right channels are supplied to the

vehicle speakers

30L and 30R, respectively. The speech signal is supplied to the right headrest speaker 35R as it is, and the phase is inverted by the phase inverter 31 and is supplied to the left headrest speaker 35L. In this case as well, since the phase difference is given to the speech signals supplied to the two

headrest speakers

35L and 35R, the sound image of the speech is localized at a position different from the sound image of the music, and the user can recognize the speech and the vocals in the music. It becomes easy to distinguish. In this example as well, as in the example of FIG. 8A, a constant phase difference is given between the speech signal supplied to one headrest speaker and the speech signal supplied to the other headrest speaker. Do it.

When using the headrest speaker, the speech may be reproduced using the headrest speaker in the passenger seat instead of the headrest speaker in the driver seat. Further, when headrest speakers are mounted on a plurality of seats of the vehicle, it may be possible to select and set the necessity of speech reproduction for each seat. In this way, it is possible to set so that the speech is reproduced only from the headrest speaker in the seat of the passenger who wants to sing the music while listening to the speech.

Further, instead of providing the phase difference, the sound image of the speech can be placed at an arbitrary position by using the sound image localization control calculation unit 34 and the crosstalk cancellation unit 35 in the same manner as the processing described in FIG. 8B. May be localized. This makes it easy for the user to distinguish between speech and vocals.

[2] System Configuration Next, a configuration example of a music playback system that realizes the above-described assist vocal will be described.

[2.1] First Example In the first example, assist vocal processing is executed mainly on the terminal device side. FIG. 10 shows the overall configuration of the music playback system according to the first embodiment. In the music reproduction system of the first embodiment, a plurality of vehicles 1, a content provider 2, and a gate server 3 can communicate with each other via a network 4. The plurality of vehicles 1 can communicate with the content server 2 and the gate server 3 via the network 4 by wireless communication.

Content provider 2 is a server such as a music distributor and provides music data, music metadata, lyrics data, and the like. The gate server 3 is a server that functions to realize the assist vocal according to the present embodiment, acquires music data, metadata, lyrics data, and the like of necessary music from the content provider 2 and stores them in a database (not shown). ing.

An example of the internal configuration of the vehicle 1 is shown in FIG. The vehicle 1 includes a terminal device 10, a music playback device 20, and a speaker 30.

The terminal device 10 is typically a mobile terminal such as a smartphone, and includes a communication unit 11, a control unit 12, a storage unit 13, a microphone 14, and an operation unit 15. The communication unit 11 communicates with the gate server 3 through the network 4. The control unit 12 includes a CPU and the like, and controls the entire terminal device 10.

The storage unit 13 is a memory such as a ROM or a RAM, and stores a program for the control unit 12 to execute various processes, and also functions as a work memory. When the control unit 12 executes the program stored in the storage unit 13, processing including assist vocal processing is executed. Moreover, the memory | storage part 13 may memorize | store the music data of the music preserve | saved by the user.

The microphone 14 collects sounds such as music being played in the car, singing by the user, conversation, etc., and generates sound data. The operation unit 15 is typically a touch panel or the like, and receives an operation and selection input by a user.

The music playback device 20 is a car audio, for example, and includes an amplifier. The speaker 30 is a speaker mounted on the vehicle. The music playback device 20 plays back music from the speaker 30 based on the music data supplied from the terminal device 10.

Another example of the internal configuration of the vehicle 1 is shown in FIG. In this example, the vehicle 1 includes a terminal device 10x. The terminal device 10x is a device having the functions of the terminal device 10 such as a portable terminal shown in FIG. 11A and the music playback device 20 such as car audio. Similarly to the terminal device 10, the terminal device 10 x includes a communication unit 11, a control unit 12, a storage unit 13, a microphone 14, and an operation unit 15, and a music playback unit 16 that corresponds to the music playback device 20. The terminal device 10x is connected to the speaker 30 and reproduces music from the speaker 30 based on music data.

Next, assist vocal processing by the music reproducing system of the first embodiment will be described. FIG. 12 is a flowchart of the assist vocal process according to the first embodiment. In this process, the assist vocal process is executed mainly by the

terminal device

10 or 10x (hereinafter simply referred to as “terminal device 10”).

First, the gate server 3 is connected to the content provider 2 via the network 4, acquires music data and lyrics data for a plurality of music, and stores them in an internal database (step S 101).

The terminal device 10 receives designation of the music to be played by the operation of the operation unit 15 by the user (step S102), and transmits music designation information for designating the music to the gate server 3 (step S103). The gate server 3 acquires the song data and lyrics data of the song corresponding to the received song designation information from the database, and transmits it to the terminal device 10 (step S104).

Next, the terminal device 10 performs the processing of steps S105 to S109 using the received music data and lyrics data. Here, the processing in steps S105 to S109 is the same as that in steps S3 to S7 in FIG.

Thus, in the music reproducing system of the first embodiment, the terminal device 10 mounted on the vehicle 1 mainly executes the assist vocal process.

In the above example, the gate server 3 acquires the music data from the content provider in step S101. However, when the music data is stored in the terminal device 10, the gate server 3 receives the music data from the terminal device 10. You may get it. Further, when music data is stored in the database in the gate server 3, the music data may be acquired therefrom.

[2.2] Second Embodiment In the second embodiment, a part of the assist vocal process is executed on the gate server 3 side. The overall configuration of the music playback system according to the second embodiment is the same as that of the first embodiment shown in FIG.

Next, assist vocal processing by the music reproducing system of the second embodiment will be described. FIG. 13 is a flowchart of the assist vocal process according to the second embodiment. In this process, the gate server 3 generates speech information, further generates music data with speech, and transmits it to the terminal device 10. The terminal device 10 receives and reproduces the music data with speech. This will be described in detail below.

First, the gate server 3 is connected to the content provider 2 via the network 4, acquires music data and lyrics data for a plurality of music, and stores them in an internal database (step S 201). And the gate server 3 produces | generates speech information about each music based on the acquired music data and lyrics data (step S202). This speech information generation process is the same as step S3 in FIG.

When the speech information is generated, the gate server 3 adds the speech to the music data and generates the music data with speech (step S203). Specifically, the gate server 3 combines the speech signal corresponding to each speech with the music data at the timing calculated by the process of step S36 in FIG. 3 based on the generated speech information, and generates music data with speech. And store it in the database. In other words, the music data with speech is data in which speech is reproduced in addition to the music by reproducing as it is.

The terminal device 10 receives designation of the music to be played by the operation of the operation unit 15 by the user (step S204), and transmits music designation information for designating the music to the gate server 3 (step S205). The gate server 3 transmits the song-attached music data corresponding to the received music designation information to the terminal device 10 (step S206).

Next, the terminal device 10 reproduces the received music data with speech (step S207). Thereby, the speech is reproduced at an appropriate timing during the reproduction of the music. Next, the terminal device 10 determines whether or not the music reproduction should be terminated (step S208). When the music has been played to the end, or when playback should be terminated, such as when the user has stopped playing (step S208: Yes), the terminal device 10 finishes playing. On the other hand, if the reproduction of the music should not be terminated (step S208: No), the process returns to step S207, and the reproduction of the music data with speech is continued.

Thus, in the music reproduction system of the second embodiment, the music data with speech is generated on the gate server 3 side and provided to the terminal device 10. The terminal device 10 can listen to music including speech by reproducing the received music data with speech.

In the above example, the gate server 3 acquires the music data from the content provider in step S201. However, if the music data is stored in the terminal device 10, the gate server 3 receives the music data from the terminal device 10. You may get it. Further, when music data is stored in the database in the gate server 3, the music data may be acquired therefrom.

[3] Assist Vocal that Reproduces Only Speech In the above-described assist vocal process, the music that is being reproduced by the terminal device 10 is reproduced with speech added. However, it is convenient if speech can be added to music reproduced from a source other than the terminal device 10, such as a radio in a car, a CD (hereinafter referred to as “external source”). In this case, the terminal device 10 basically generates the speech information by the above-described method, and only needs to reproduce the speech at a timing corresponding to the reproduction position of the music reproduced from the external source.

FIG. 14 shows a flowchart of assist vocal processing in this case. First, the terminal device 10 collects music reproduced from an external source by the microphone 14 to acquire reproduced music data (step S151), and transmits this to the gate server 3 (step S152).

The gate server 153 receives the reproduction music data from the terminal device 10, and specifies the corresponding music and its reproduction position (step S153). Specifically, the gate server 3 includes a music search unit having the function of the music search server described above, specifies the music based on the reproduced music data, and reproduces the reproduction position corresponding to the reproduced music data portion. Is identified. Then, the gate server 3 transmits the lyrics data and the reproduction position information to the terminal device 10 together with the music name and artist name of the specified music (Step S154).

The terminal device 10 generates speech information using the received lyric data (step S155). Note that the speech information is generated by the same method as described with reference to FIG. In addition, the terminal device 10 can acquire music analysis data by analyzing the reproduction music data acquired with the microphone 14 (process of step S32 of FIG. 3).

Next, the terminal device 10 calculates the current playback position of the music based on the playback position information acquired from the gate server 3 (step S156). This method will be described later. Next, the terminal device 10 performs speech enhancement processing (step S157), and reproduces speech at an appropriate timing according to the music being reproduced by the external source (step S158). As a result, the speech is reproduced in accordance with the music being reproduced from the external source.

Then, the terminal device 10 determines whether or not to end the speech reproduction (step S159), and if not to end, returns to step S156 and continues the process. On the other hand, when the playback of a song from an external source is finished, when the song being played is changed to another song, when there is no more speech to be played, etc. S159: Yes), the process ends.

Next, with reference to FIG. 15, a method for specifying the current reproduction position of the music in step S156 will be described. The reproduced music data transmitted from the terminal device 10 to the gate server 3 is actually data of a plurality of audio frames. That is, the terminal device 10 collects the music reproduced by the external source with the microphone 14 and sequentially transmits it to the gate server 3 as a plurality of audio frames.

In the example of FIG. 15, the terminal device 10 has audio frames n, (n + 1), (n + 2),. . . Are sequentially transmitted to the gate server 3 as reproduced music data. At this time, the terminal device 10 stores the time when the reproduced music data is first transmitted, and the time when the audio frame n is transmitted in the example of FIG. 15 (hereinafter referred to as “reference time t0”).

The music search unit of the gate server 3 refers to information on a large number of music pieces stored in the database, and specifies music pieces based on the received plurality of audio frames. In the example of FIG. 15, it is assumed that the music search unit of the gate server 3 can identify the music based on the audio frames n to (n + 4). In this case, the gate server 3 uses the playback time information (tn) from the beginning of the audio frame n received from the terminal device 10 as the playback position information as the playback position information in addition to the music title, artist name, etc. Transmit to device 10. That is, the reproduction position information transmitted from the gate server 3 to the terminal device 10 in step S154 of FIG. 14 is the elapsed time from the beginning of the music of the audio frame n first transmitted to the gate server 3 by the terminal device 10. It has become. Therefore, in step S156, the terminal apparatus 10 calculates the elapsed time Δt of the elapsed time from the reference time t0 stored in advance to the present, and adds this to the reproduction time tn. That is, the reproduction time tn transmitted from the gate server 3 is the time from the beginning of the music to the audio frame n, and the elapsed time Δt is the time from the audio frame n to the present. Therefore, the current playback position (playback time) Tc is calculated by the following equation.

Tc = tn + Δt (2)
As described above, by providing a music search function in the gate server 3 and specifying the music and its reproduction position based on the reproduction music data, it is possible to reproduce the speech according to the music being reproduced from the external source. . Further, an external music search server may be used instead of providing the gate server 3 with a music search function.

In step S159, the reproduction may be ended when one music is finished. However, when another music is reproduced after one music is finished, the process is continued. Also good. That is, the speech reproduction may be continued while the transmission of the music reproduction data from the terminal device 10 to the gate server 3 is continued. Thereby, even if the music reproduced from the external source changes, it becomes possible to continue the speech reproduction following the song.
[4] Usage Restriction Next, the usage restriction of the assist vocal will be described. When performing assist vocal, the terminal device 10 operates together with a music playback device 20 such as a car audio mounted on the vehicle 1. Here, there are various products as the music playback device 20. However, if the music playback device 20 used for assist vocals is unlimited, there will be a problem in the sound quality and copyright management of the played music. there is a possibility. Therefore, the above-mentioned problem is solved by providing a restriction on the music playback device 20 that can be used when performing assist vocals. Specifically, the assist vocal can be executed only when a product produced by a specific producer is used as the music playback device 20.

FIG. 16 (A) schematically shows a method of restricting the use of unsold products, that is, products that are newly sold in the market. A producer who produces a product that can execute assist vocals (hereinafter also referred to as “usable product”) assigns a device ID to each product when the product is produced in a production factory. This device ID can be a serial number of a product, for example, and is stored in the internal memory 20x of the music playback device 20 before shipment from the factory. The production factory notifies the gate server 3 of the device ID assigned to each shipped product, and the gate server 3 stores the device ID in the internal storage unit 3x. Thereby, the device ID of the use permitted product is stored in the storage unit 3x of the gate server 3 as the use permission information.

The user who purchased the product of the music playback device 20 attaches it to the vehicle. Thereby, as shown in FIG. 16A, the music playback device 20 can communicate with the terminal device 10. Further, the device ID of the product is stored in the memory 20x of the music playback device 20 as described above.

Now, when executing the assist vocal, the user executes the availability check process using the device ID. FIG. 17 shows a flowchart of the availability check process. This availability check process is executed between the gate server 3 and the terminal device 10 in the environment shown in FIG.

First, the terminal device 10 communicates with the music playback device 20 to obtain a device ID (step S301) and transmits it to the gate server 3 (step S302). Based on the received device ID, the gate server 3 determines whether the music playback device 20 is a use-permitted product (step S303). As described above, the device ID of the permitted product is stored in the storage unit 3x of the gate server 3. Therefore, the gate server 3 determines whether or not the received device ID is stored in the storage unit 3x. When the received device ID is stored in the storage unit 3x, the gate server 3 determines that the music playback device 20 is a use-permitted product, and uses the music playback device 20 when it is not stored. It is determined that the product is not a permitted product. Then, the gate server 3 transmits the determination result to the terminal device 10 (step S304).

The terminal device 10 receives the determination result and notifies the user by displaying it on the display unit (step S305). Thus, the availability check process ends.

Next, a method for restricting the use of products already sold will be described. FIG. 16B schematically shows a method of restricting the use of a music playback device 20 that has already been sold. In this case, since the device ID is not notified to the gate server 3 from the production factory or the like, the device ID of the permitted product is not stored in the gate server 3. However, a device ID such as a serial number is usually given to a sold product, and the device ID often includes a code unique to the producer or the like. Therefore, authentication is performed using this unique code as use permission information.

For example, if the unique code of the producer “P company” of the licensed product is “PEC”, a device ID including “PEC” is stored in the memory 20x of the music playback device 20 produced by P company. Yes. Therefore, the storage unit 3x of the gate server 3 stores the unique code “PEC” as a use permission code. When the device ID is transmitted from the terminal device 10, the gate server 3 determines whether or not the usage permission code “PEC” is included in the received device ID. If the usage permission code “PEC” is included, the music playback device 20 is determined to be a usage-permitted product. If the check code “PEC” is not included, the music playback device 20 is determined. It is determined that the product is not a licensed product. In this way, it is possible to restrict the use of the music playback apparatus 20 that has already been sold.

The availability check process for a sold product is performed according to the flowchart shown in FIG. 17 as in the case of an unsold product. However, in the case of a sold product, in step S303, the gate server 3 determines whether or not the product is a use permitted product depending on whether or not the use permission code “PEC” is included in the device ID received from the music playback device 20. Determine.

Next, the timing for executing the availability check process will be described. The availability check process can be performed at the first communication between the terminal device 10 and the gate server 3 after the music playback device 20 is first mounted on the vehicle 1. It can also be performed at the first execution of the assist vocal using the music playback device 20. That is, when an assist vocal is first requested from the terminal device 10, the gate server 3 requests the terminal device 10 to transmit a device ID and performs a usability check process.

Alternatively, the availability check process may be performed every time the user executes the assist vocal. In this case, every time an assist vocal request is made from the terminal device 10, the gate server 3 requests the device ID of the music playback device 20 from the terminal device 10 and determines whether or not it can be used. The gate server 3 continues the assist vocal process thereafter only when it is determined that the music playback device 20 is a use permitted product. The terminal device 10 executes assist vocal by the method shown in FIG. 13 or FIG. Specifically, the terminal device 10 transmits the song data with lyrics voice generated by the gate server 3 or the terminal device 10 to the music playback device 20, and plays back the song data with lyrics voice received by the music playback device 20. . Here, the terminal device 10 receives the identification information of the music playback device 20 again before transmitting the music data with lyrics audio to the music playback device 20, and the music playback device 20 that is about to transmit the music data with lyrics audio transmits the music data. It may be determined again whether or not the product is determined to be a use-permitted product by the gate server 3. As a result of the re-determination, when it is determined that the music playback device 20 that is going to transmit the song data with lyrics voice is a use permitted product, the terminal device 10 sends the song data with lyrics voice to the music playback device 20. To do. As a result of the re-determination, when it is determined that the music playback device 20 that is going to transmit the song data with lyrics voice is not a use-permitted product, the terminal device 10 sends the song data with lyrics voice to the music playback device 20. Do not send to. On the other hand, when the gate server 3 determines that the music playback device 20 is not a use-permitted product, the gate server 3 notifies the terminal device 10 to that effect.

[5] Choral function Next, the choral function using assist vocals will be described. The assist vocal allows the user to sing a song while driving, but the joy of singing tends to be insufficient when there is only one driver. Therefore, the singing voice data of a plurality of users is collected and stored in the gate server 3, and when a certain user performs assist vocals, the singing voice data of other users are simultaneously downloaded from the gate server 3 and reproduced on the vehicle. Thereby, even if the number of users (drivers) is one, pseudo chorus can be realized.

[5.1] Generation of Singing Voice Data In order to realize the chorus function, it is necessary for a plurality of users to generate singing voice data and upload it to the gate server 3. Hereinafter, a method for generating singing voice data will be described.

(1) First Method In the first method, the singing voice data of the user is generated by subtracting the sound when the user is not singing from the sound when the user sings according to music in the same vehicle.

FIG. 18A schematically shows an environment for recording a sound when the user is singing. In the passenger compartment, the music reproducing device 20 reproduces music from the speaker 30 into the passenger compartment based on the source sound source, and the user U sings along with the reproduced music. The sound at that time is collected by the microphone M arranged in the vehicle interior. The recording data generated by the microphone M includes the singing voice of the user in addition to the sound of the music (hereinafter referred to as “recording data with singing voice”). This recorded data includes the acoustic characteristics CH in the passenger compartment. Note that the microphone 14 of the terminal device may be used as the microphone M.

FIG. 18 (B) schematically shows an environment for recording a sound when the user is not singing. In the passenger compartment, the music reproducing device 20 reproduces music from the speaker 30 into the passenger compartment based on the source sound source, and the reproduced sound is collected by a microphone M arranged in the passenger compartment. The recording data generated by the microphone M includes the sound of the music, but does not include the user's singing voice (hereinafter referred to as “recording data without a singing voice”). This recorded data also includes the acoustic characteristic CH in the passenger compartment.

Using the recording data thus obtained, the terminal device 10 generates singing voice data by subtracting the recording data without singing voice from the recording data with singing voice as shown in FIG. Thus, the difference of the recording data when a user sings and the recording data when not singing can be produced | generated as a user's singing voice data.

(2) Second Method In the second method, similarly to the first method, recording data with a singing voice is generated as shown in FIG. On the other hand, the recording data without singing voice is not recorded, but instead, the data without singing voice is generated from the sound characteristics of the source sound source and the passenger compartment. The acoustic characteristic in the vehicle interior specifically refers to an impulse response measured in advance in the vehicle interior.

As can be understood from FIG. 18B, since the recording data without singing voice is obtained by recording the music obtained by reproducing the source sound source under the acoustic characteristics in the vehicle interior, the sound in the vehicle interior is included in the source sound source. By convolving the characteristics, it is possible to generate singing-free data equivalent to recording data without singing. Then, as shown in FIG. 19B, the singing voice data of the user can be generated by subtracting the singing voiceless data thus generated from the recording data with singing voice.

(3) The third method The third method is basically the same as the second method, by generating the singing-free data by convolving the acoustic characteristics of the vehicle interior with the source sound source, and using this from the recorded data with singing voice Subtract and generate singing voice data. However, although the second method is based on the premise that the acoustic characteristics in the passenger compartment do not change, the acoustic characteristics in the passenger compartment actually change according to time and circumstances. Therefore, in the third method, the change in the acoustic characteristics in the passenger compartment is corrected by adaptive signal processing.

FIG. 19C is a block diagram of a configuration for generating singing voice data by the third method. The recorded data with singing voice is corrected by the filter 61 and input to the adder 62. In addition, data without singing voice generated by convolving the acoustic characteristics of the vehicle interior with the source sound source is input to the adder 62. The adder 62 subtracts the data without singing voice from the filtered recording data with the singing voice and outputs it as singing voice data.

The singing voice data is also supplied to the adaptive signal processing unit 63. The adaptive signal processing unit 63 calculates the characteristic (coefficient W) to be set in the filter 61 so as to remove the error included in the singing voice data, that is, the variation due to the change in the acoustic characteristic in the vehicle interior, and supplies the calculated characteristic To do. For example, the adaptive signal processing unit 63 calculates the coefficient W of the filter 61 so that the singing voice data in the period in which the singing voice is not included or the frequency component in which the singing voice is not included becomes zero. Thus, the change in the acoustic characteristics in the passenger compartment is canceled by the filter 61.

[5.2] Singing Voice Data Generation Processing Next, singing voice data generation processing will be described. The singing voice data may be generated on the gate server 3 side or generated on the terminal device 10 side. In the following description, for the convenience of explanation, it is assumed that singing voice data is generated by the first method.

FIG. 20 is a flowchart when the gate server 3 generates singing voice data. First, the terminal device 10 generates recording data with a singing voice as shown in FIG. 18A (step S401), and then generates recording data without a singing voice as shown in FIG. 18B (step S402). ). And the terminal device 10 transmits the recording data with singing voice and the recording data without singing voice to the gate server 3 (step S403). At this time, the terminal device 10 adds music information such as a music code corresponding to the recorded data and transmits the data.

The gate server 3 receives the recording data with singing voice and the recording data without singing voice, generates singing voice data by the calculation shown in FIG. 19A, and stores it in the internal database in association with the music based on the music information ( Step S404). Thus, the singing voice data of a plurality of users is stored in the gate server 3 for each music piece.

FIG. 21 is a flowchart when the terminal device 10 generates singing voice data. First, the terminal device 10 generates recording data with a singing voice as shown in FIG. 18A (step S411), and then generates recording data without a singing voice as shown in FIG. 18B (step S412). ). And the terminal device 10 produces | generates singing voice data from the recording data with a singing voice and recording data without a singing voice by the operation | movement shown to FIG. 19 (A) (step S413), and transmits to the gate server 3 (step S414). At this time, the terminal device 10 adds music information such as a music code corresponding to the singing voice data and transmits it.

The gate server 3 receives the singing voice data and the music information, and stores the singing voice data in the internal database in association with the music based on the music information (step S415). Thus, the singing voice data of a plurality of users is stored in the gate server 3 for each music piece.

[5.3] Choral Process Next, a choral process using singing voice data will be described.

(1) When the speech information generation process is performed by the terminal device FIG. 22 is a flowchart of the choral process when the speech information generation process is performed by the terminal device 10 side. In this example, the terminal device 10 mainly generates data necessary for the choral process.

First, the gate server 3 is connected to the content provider 2 via the network 4, acquires music data and lyrics data for a plurality of music, and stores them in an internal database (step S 501).

The terminal device 10 receives the designation of the music to be played by the operation of the operation unit 15 by the user (step S502), and further receives the designation to use the choral function (step S503). Next, the terminal device 10 transmits music designation information (including designation of the choral function) for designating the music to the gate server 3 (step S504). The gate server 3 acquires the song data, lyrics data, and singing voice data of the song corresponding to the received song designation information from the database, and transmits it to the terminal device 10 (step S505).

Next, the terminal device 10 generates speech information by using the received music data and lyrics data (step S506). And the terminal device 10 reproduces | regenerates a song and speech voice, and also reproduces a singing voice based on singing voice data (step S507).

Next, the terminal device 10 determines whether or not the reproduction of the music has ended (step S508). If the reproduction of the music has not ended, the process returns to step S507, and if the reproduction of the music has ended, the process ends.

In the above example, the gate server 3 acquires the music data from the content provider in step S501. However, when the music data is stored in the terminal device 10, the gate server 3 receives the music data from the terminal device 10. Data may be acquired. Further, when music data is stored in the database in the gate server 3, the music data may be acquired therefrom.

(2) When the speech information generation process is performed by the gate server FIG. 23 is a flowchart of the choral process when the speech information generation process is performed by the gate server 3 side. In this example, the gate server 3 mainly generates data necessary for the choral process.

First, the gate server 3 is connected to the content provider 2 via the network 4, acquires music data and lyrics data for a plurality of music, and stores them in an internal database (step S511). And the gate server 3 produces | generates speech information about each music based on the acquired music data and lyrics data (step S512).

When the speech information is generated, the gate server 3 adds the speech to the music data and generates music data with speech (step S513). Specifically, based on the generated speech information, the gate server 3 synthesizes a speech signal corresponding to each speech with music data at an appropriate timing, generates music data with speech, and stores it in the database.

The terminal device 10 receives the designation of the music to be played by the operation of the operation unit 15 by the user (step S514), and further receives the designation to use the choral function (step S515). Next, the terminal device 10 transmits music designation information (including designation of the choral function) for designating the music to the gate server 3 (step S516).

The gate server 3 reads the music data with speech and singing voice data of the music corresponding to the received music designation information from the database, synthesizes them, and generates music data with singing voice and speech (step S517). (Step S518).

The terminal device 10 reproduces the received song data with speech / speech (step S519). As a result, the speech is reproduced and the singing voice of another user is reproduced at an appropriate timing during the reproduction of the music.

Next, the terminal device 10 determines whether or not the reproduction of the music has ended (step S520). If the reproduction of the music has not ended, the process returns to step S519, and if the reproduction of the music has ended, the process ends.

In the above example, the gate server 3 acquires the music data from the content provider in step S511. However, if the music data is stored in the terminal device 10, the gate server 3 receives the music data from the terminal device 10. Data may be acquired. Further, when music data is stored in the database in the gate server 3, the music data may be acquired therefrom.

[5.4] Modification (Modification 1)
The singing voice to be reproduced when performing the choral function is not limited to one singing voice. For example, a user who wants to execute the choral function may be allowed to specify the number of choruses together. In this case, the gate server 3 should just perform a chorus function using the singing voice data for the designated number of people.

(Modification 2)
In the above example, the gate server 3 stores the singing voice data in the database for each piece of music. In addition, the gate server 3 associates the attribute information of the user who generated the singing voice data, for example, sex, age, and the like. You may remember. In this case, when recording data or singing voice data is uploaded to the gate server 3, the user may add his / her attribute information and transmit it. These pieces of information may be input by the user, but the information stored in the mobile terminal 10 may be automatically read by the terminal device 10 and transmitted to the gate server 3.

Thereby, the user who intends to perform the choral function using the singing voice data stored in the gate server 3 can execute the choral function by specifying the gender, age, etc. of the singing voice to be reproduced at the same time. .

(Modification 3)
In the above example, the singing voice data is basically generated for one entire piece of music, but may be generated for a part of one piece of music. For example, you may produce | generate for every 1st and 2nd of one music, and may produce | generate only about a chorus and a chorus part. In this case, the singing voice data is stored in the database of the gate server 3 together with information indicating such a part (for example, No. 1 of the song) or the reproduction time of the song for each song.

And the user who performs the choral function can perform the choral function by using a plurality of singing voice data of only a part of the music by designating information indicating such a part. For example, it is possible to enjoy chorus with different users by using singing voice data of different users for the first and second songs.

[5.5] Another Use Example of Method for Generating Singing Voice Data Even if the second method for generating singing voice data described above is used for determining whether or not the user is singing in the assist vocal automatic on setting. Good. In the second method, singing voiceless data is generated by convolving the acoustic characteristics in the passenger compartment with the source sound source, and this singing voiceless data is data when the user is not singing. Therefore, the terminal device 10 collects the sound in the passenger compartment with the microphone during the reproduction of the music, and subtracts the singing-free data from the obtained data. And the terminal device 10 determines with the user singing, when the component of a user's singing voice is contained in the data obtained by subtraction, and when not contained, the user is not singing. Can be determined. Whether or not the component of the user's singing voice is included is determined, for example, by determining whether or not the signal level in the general human voice frequency band is equal to or higher than a predetermined value in the data obtained by subtraction. Can do.

The present invention can be used for an apparatus for playing music.

DESCRIPTION OF SYMBOLS 1 Vehicle 2 Content provider 3 Gate server 4

Network

10, 10x terminal device 12 Control part 13 Memory | storage part 14 Microphone 20 Music reproducing apparatus 30 Speaker

Claims

A communication system comprising a server and a terminal device,
The server
A storage unit for storing use permission information for specifying an available playback device;
Determining means for determining whether or not the playback device can be used based on the identification information of the playback device received from the terminal device and the use permission information;
Transmission means for transmitting content data corresponding to information specifying the content received from the terminal device to the terminal device when the determining device determines that the playback device is usable;
The terminal device
Identification information acquisition means for acquiring identification information of the reproduction device from a reproduction device connected to the terminal device, and transmitting the identification information to the server;
First communication means for transmitting information specifying content to be played back to the server and receiving content data corresponding to the content from the server when it is determined that the playback device is usable;
Second communication means for transmitting the content data to the playback device determined to be usable;
A communication system comprising:
A playback system comprising a server and a terminal device,
The server
A storage unit for storing use permission information indicating an available music playback device;
Determination means for determining whether or not the music playback device can be used based on the identification information of the music playback device received from the terminal device and the use permission information;
An acquisition means for acquiring lyrics data of the music;
Transmitting means for transmitting to the terminal device the lyric data corresponding to the information specifying the reproduced music received from the terminal device when the determining device determines that the music reproducing device is usable;
The terminal device
Identification information acquisition means for acquiring identification information of the music playback device from the music playback device connected to the terminal device, and transmitting the identification information to the server;
An input means for selecting a reproduction music that is a music to be reproduced;
Music data acquisition means for acquiring music data of the selected playback music;
First communication means for transmitting information specifying the selected reproduction music piece to the server, and receiving lyrics data corresponding to the reproduction music piece from the server when the music reproduction device is determined to be usable;
Lyrics audio data generating means for generating lyrics audio data based on the lyrics data;
Song data with lyric audio generating means for generating song data with lyric audio by adding the lyric audio data to the song data so as to precede the lyric part in the song;
Second communication means for transmitting the song data with lyrics audio to the music playback device determined to be usable;
A reproduction system comprising:
A playback system comprising a server and a terminal device,
The server
A storage unit for storing use permission information for specifying an available music playback device;
Determination means for determining whether or not the music playback device can be used based on the identification information of the music playback device received from the terminal device and the use permission information;
Obtaining means for obtaining music data of the music and lyrics data of the music;
Lyrics audio data generating means for generating lyrics audio data based on the lyrics data;
Song data with lyric audio generating means for generating song data with lyric audio by adding the lyric audio data to the song data so as to precede the lyric part in the song;
Transmitting means for transmitting, to the terminal device, the song data with lyrics voice corresponding to the information specifying the reproduced music received from the terminal device when the determining device determines that the music reproducing device is usable; With
The terminal device
Identification information acquisition means for acquiring identification information of the music playback device from the music playback device connected to the terminal device, and transmitting the identification information to the server;
An input means for selecting a reproduction music that is a music to be reproduced;
1st communication which transmits the information which designates the selected reproduction | regeneration music to the said server, and receives the music data with lyrics audio | voice corresponding to the said reproduction | regeneration music from the said server, when it determines with the said music reproduction apparatus being usable. Means,
Second communication means for transmitting the song data with lyrics audio to the music playback device determined to be usable;
A reproduction system comprising:
The second communication means receives the identification information of the music playback device before transmitting the music data with lyrics voice to the music playback device, and the music playback device is the server based on the received identification information. When it is re-determined that the music playback device is determined to be usable, the music data with lyrics voice is transmitted to the music playback device that has received the identification information, and the music playback device is configured to receive the identification information based on the received identification information. 4. The music data with lyrics voice is not transmitted to the music playback device that has received the identification information when it is determined again that the music playback device is not usable by the server. Playback system.
The storage unit stores identification information of an available music playback device as the use permission information,
The determination unit determines that the music playback device can be used when the same identification information as the identification information received from the terminal device is stored in the storage unit. The reproduction | regeneration system as described in any one of.
The storage unit stores a predetermined use permission code as the use permission information,
The determination unit determines that the music playback device can be used when the use permission code is included in the identification information received from the terminal device. The reproduction system according to item.
A terminal device capable of communicating with a server,
Identification information acquisition means for acquiring identification information of the reproduction device from a reproduction device connected to the terminal device, and transmitting the identification information to the server;
First communication means for transmitting information specifying content to be played back to the server and receiving content data corresponding to the content from the server when it is determined that the playback device is usable;
Second communication means for transmitting the content data to the playback device determined to be usable;
A terminal device comprising:
A content communication method executed by a terminal device capable of communicating with a server,
An identification information acquisition step of acquiring identification information of the reproduction device from the reproduction device connected to the terminal device, and transmitting the identification information to the server;
A first communication step of transmitting information specifying content to be played back to the server and receiving content data corresponding to the content from the server when it is determined that the playback device is usable;
A second communication step of transmitting the content data to the playback device determined to be usable;
A content communication method comprising:
A program executed by a terminal device capable of communicating with a server,
Identification information acquisition means for acquiring identification information of the reproduction device from the reproduction device connected to the terminal device, and transmitting the identification information to the server;
First communication means for transmitting information specifying content to be reproduced to the server and receiving content data corresponding to the content from the server when it is determined that the reproduction apparatus is usable;
A second communication means for transmitting the content data to the playback device determined to be usable;
A program for causing the terminal device to function as:
A server capable of communicating with the server and the terminal device,
A storage unit for storing use permission information for specifying an available playback device;
Receiving means for receiving, from the terminal device, identification information of a playback device connected to the terminal device, and information specifying the content;
Determining means for determining whether or not the playback device is usable based on the identification information and the use permission information;
A transmission unit configured to transmit content data corresponding to information specifying the content to the terminal device when the determination unit determines that the reproduction apparatus is usable;