WO2009139022A1

WO2009139022A1 - Audio output device and program

Info

Publication number: WO2009139022A1
Application number: PCT/JP2008/001216
Authority: WO
Inventors: 児玉泰輝; 莪山真一
Original assignee: パイオニア株式会社
Priority date: 2008-05-15
Filing date: 2008-05-15
Publication date: 2009-11-19
Also published as: JPWO2009139022A1

Abstract

An audio output device capable of inserting audio information in accordance with music during playback so as to prevent from obstructing music appreciation as much as possible is provided. An audio output device (1) comprises an audio information inserting section (11) for inserting audio information which is a guidance voice and/or a sound effect during the playback of music, an audio information adjusting section (30) for adjusting the elements of the sound and/or voice of the audio information according to the elements of the sound and/or voice during the insertion of the audio information of the music being played back, and an audio output section (22) for outputting audio on the basis of the audio information adjusted by the audio information adjusting section (30).

Description

Audio output device and program

The present invention relates to an audio output device and a program for inserting and outputting audio information during reproduction of a song.

2. Description of the Related Art Conventionally, a navigation system having a car navigation function and an audio player function and performing route guidance by inserting voice information during reproduction of a song is known (for example, Patent Document 1). This navigation system discriminates the priority of voice guidance, and when the priority is high, the music reproduction is interrupted and voice guidance is inserted. If the priority is low, voice guidance is inserted after the end of the music being played back. With this configuration, voice guidance that is not so important for the driver can be performed between the songs, and there is an effect that the music being played is not interrupted more than necessary.
JP 2001-116581 A

However, in consideration of the actual application, in the above navigation system, most of the voice guidance is inserted by interrupting the music playback. For example, in car navigation, voice guidance is often performed several times before actually making a right turn, such as “It is a right turn 300 meters ahead”, “Soon to the right”, “It is right”. In the above navigation system, since these voice guidances are all determined to be “high priority”, the music reproduction is interrupted. Such voice guidance may be important for the driver, but it is often not important for the passengers, which makes it uncomfortable. It is also desirable for the driver to be able to enjoy music as comfortably as possible while confirming voice guidance.

In view of the above problems, an object of the present invention is to provide an audio output device and a program capable of inserting audio information according to a song being played, for example, so as not to disturb music appreciation as much as possible. .

The audio output device of the present invention includes audio information insertion means for inserting audio information which is a guidance voice and / or sound effect during reproduction of a song, and sound and / or Or, according to the voice element, the voice information adjusting means for adjusting the sound of the voice information and / or the voice element, and the voice output means for outputting the voice based on the voice information adjusted by the voice information adjusting means, It is provided with.

In the above-described audio output device, the audio information adjustment unit may adjust the sound of the audio information and / or the sound of the music and / or the voice element so that the fitness is high or the fitness is low. It is preferable to adjust the voice component.

According to these configurations, in order to adjust the sound and / or voice element of the sound information according to the sound and / or voice element at the time of inserting the sound information of the music (musical piece) being reproduced, By adjusting the sound and / or voice elements so that the degree of adaptation is high, it is possible to reduce the possibility that the sound information hinders music appreciation. In addition, by adjusting so that the fitness level is low, the audio information is not confused with the song, and the audio information can be clearly transmitted to the audience.
The “sound and / or voice element” means “at least one of a sound element and a voice element”.
Further, the “song” and “speech information” need only include either sound or voice, and do not necessarily include both. Also, adjust the voice element of “voice information” according to the sound element of “song”, or adjust the sound element of “voice information” according to the voice element of “song” For example, both elements do not necessarily match. In addition, in the case of “voice information” including both sound and voice, both may be output at the same time, or both are output separated in time, for example, voice is added after the sound. It may be a thing. The “sound effect” is a concept including an arousing sound and a warning sound.
Further, the means for reproducing the music may be provided in the audio output device or in an external device other than the audio output device. In the latter case, the audio output device may acquire a playlist of songs from an external device in advance, and perform audio adjustment based on the playlist. Further, the sound adjustment may be performed in real time while acquiring the sound signal of the music being reproduced.

In the audio output device described above, the importance of the audio information is set in accordance with the content thereof, and the audio information adjustment unit is configured to use the sound of the song and / or the voice element for the audio information having a high importance. The sound and / or voice elements of the audio information are adjusted so that the degree of adaptation is low with respect to the voice information. It is preferable to adjust the sound and / or the voice element of the voice information.

According to this configuration, it is possible to adjust the sound and / or voice elements according to the importance of the audio information (to increase / decrease the degree of adaptation to the music). As a result, audio information with high importance can be clearly communicated to the audience (driver), and audio information with low importance can be made less likely to interfere with music appreciation. It is possible to perform sound adjustment that is favorable for both the driver and the passenger.

In the audio output device described above, music metadata that is information related to the sound and / or voice elements of music and audio information metadata that is information related to the sound and / or voice elements of the audio information are stored. It is preferable to further include metadata storage means, and the sound information adjustment means adjusts the sound and / or voice elements of the sound information with reference to the song metadata and the sound information metadata.

According to this configuration, it is possible to easily perform sound adjustment by storing information on sound and / or voice elements of music and sound information as metadata.

The voice output device described above further includes voice information storage means for storing a plurality of types of voice information having different sound and / or voice elements, and the voice information adjustment means includes the sound of the tune when the voice information is inserted It is preferable to select one piece of audio information to be output from among a plurality of types of audio information stored in the audio information storage unit according to the voice element.

According to this configuration, it is possible to perform sound adjustment by an easy process of simply selecting one piece of sound information to be output from a plurality of types of sound information.

In the audio output device described above, the audio information adjusting means uses the sound and / or voice of the song at the time of inserting the audio information, and uses the sound and / or voice of the audio information at the time of inserting the audio information. It is preferable to produce.

According to this configuration, since voice adjustment is performed when voice information is inserted, a storage capacity for storing a plurality of types of voice information is not required. Also, since the sound and / or voice of the sound information is generated using the sound and / or voice of the music being reproduced, a variety of sound information can be output.

In the audio output device described above, the audio information adjusting means may adjust the sound and / or voice element of the audio information in accordance with the sound and / or voice element of the song at the start of insertion of the audio information. preferable.

According to this configuration, when the voice information has a time length, it is conceivable that the sound and / or voice elements change in the middle of the song. Since the voice adjustment can be performed together, it is possible to cope with the case where the time length of the voice information is not defined in advance.

In the audio output device described above, the sound element includes one or more elements of tune, chord, and rhythm, and the voice element includes any one or more elements of pitch, volume, voice quality, and pronunciation It is preferable to contain.

According to this configuration, those elements of the voice information can be adjusted according to the tone, chord, rhythm, voice pitch, voice volume, voice quality, and pronunciation included in the song. For example, when the music is in a quiet tone, the possibility of hindering the music appreciation can be reduced by inserting a guidance voice with a quiet voice quality. Also, when the music is quiet, the voice information can be clearly communicated to the audience by inserting a large volume of guidance voice.

In the audio output device described above, it is preferable that the audio output device further includes a music reproducing unit that reproduces the music, and the audio output unit outputs the music reproduced by the music reproducing unit together with the sound and / or voice based on the audio information. .

According to this configuration, music reproduction and voice information insertion can be realized with a single device.

Another audio output device according to the present invention is adapted to insert audio information that is guide voice and / or sound effect during reproduction of a song, and according to the genre of the song that is being reproduced when the audio information is inserted. Voice information adjusting means for adjusting the sound source and / or language of the voice information, and voice output means for outputting voice based on the voice information adjusted by the voice information adjusting means. .

According to this configuration, in order to adjust the sound source and / or language of the audio information according to the genre of the song being played, for example, by adjusting so that the degree of fitness is high with respect to the genre of the song, It is possible to reduce the possibility that the audio information hinders music appreciation. Further, by adjusting so that the fitness level is low, the audio information is not mixed with the music (music), and the audio information can be clearly communicated to the audience.
The “song genre” indicates a type such as Western music or Japanese music, a type such as classic or jazz, a type such as movie music or CM music. The “sound source” refers to a device that generates sound, such as a musical instrument to be played.

The program of the present invention is a program for causing a computer to function as each means in the above-described audio output device.

By using this program, it is possible to realize an audio output device that can insert audio information according to the music being played, for example, so as not to disturb the music appreciation as much as possible.

It is a block diagram which shows the control structure of the audio | voice output apparatus which concerns on one Embodiment of this invention. It is a figure which shows an example of an audio guidance list. It is a figure which shows an example of a playlist. It is a figure which shows an example of guidance audio | voice metadata. It is a figure which shows an example of arousal sound metadata. It is a figure which shows an example of music metadata. It is a flowchart which shows the audio | voice output process of an audio | voice output apparatus.

Explanation of symbols

DESCRIPTION OF SYMBOLS 1 ... Voice output device 10 ... Car navigation part 11 ... Voice information insertion part 15 ... Voice guidance list 20 ... Player part 21 ... Song reproduction part 22 ... Voice output part 25 ... Playlist 30 ... Voice information adjustment part 31 ... Guide voice adjustment part 32 ... Arousing sound adjustment unit 41 ... Content metadata DB 42 ... Content DB

Hereinafter, an audio output device and a program according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings. In the present embodiment, an in-vehicle audio output device that has a car navigation function and an audio player function and inserts audio information during the reproduction of a song is exemplified.

FIG. 1 is a block diagram showing a control configuration of the audio output device 1. As shown in the figure, the audio output device 1 includes a car navigation unit 10 that controls a car navigation function, a player unit 20 that controls an audio player function, and an audio information adjustment unit 30 that adjusts audio information for performing car navigation. , A content metadata database (hereinafter referred to as “content metadata DB”) 41 storing metadata related to audio information and music, and a content database (hereinafter referred to as “content DB”) storing audio information and music content. 42).

Similar to a general car navigation device, the car navigation unit 10 is configured to provide route guidance (road information) based on a route or destination set by a user (driver) and GPS information received from a GPS (Global Positioning System) receiver. Guidance). It also obtains road traffic information and provides traffic information regarding traffic jam information and traffic regulations. Therefore, although not particularly illustrated, the car navigation unit 10 includes the GPS receiver, a control program for performing route guidance, and a display for performing route display.

Furthermore, the car navigation unit 10 has a voice information insertion unit 11. The voice information insertion unit 11 includes voice information (guidance voice for performing route guidance and traffic guidance by voice, and arousing sound output for attracting the driver's attention before outputting the guidance voice. ) Is output to the audio information adjustment unit 30 so as to be inserted into the music being reproduced by the player unit 20. The voice information insertion unit 11 inserts voice information according to the voice guidance list 15 (see FIG. 2) created in advance in the car navigation unit 10. Note that the voice guidance list 15 is updated in real time according to a situation that changes every moment (such as a traveling speed of a vehicle on which the voice output device 1 is mounted and a road situation).

The player unit 20 includes a song reproduction unit 21 that reproduces a song according to a playlist 25 (see FIG. 3) selected by the user, a song reproduced by the song reproduction unit 21, and a voice inserted by the audio information insertion unit 11. A voice output unit 22 that outputs voice (sound and voice) based on the information. Although not particularly illustrated, the player unit 20 includes an audio control device and a speaker for performing various audio processes.

The sound information adjusting unit 30 is configured to change the sound and voice of the sound information according to the sound and voice elements at the time when the sound information is inserted by the sound information inserting unit 11 of the song being reproduced by the song reproducing unit 21. It adjusts elements, and has a guidance voice adjustment unit 31 and a rousing sound adjustment unit 32. In the present embodiment, the guidance voice adjusting unit 31 “voice quality (voice color, tone of voice)” that is a voice element of voice information (guidance voice) according to “musical tone (melody)” that is an element of the sound of a song. ”. The arousal adjustment unit 32 adjusts a chord that is a sound element of sound information (arousal sound) in accordance with a chord that is a sound element of the music. Furthermore, the arousing sound adjustment unit 32 adjusts the arousing sound in consideration of the importance of the arousing sound (in the case of the present embodiment, the importance of the following guidance voice). A specific adjustment method will be described later.

Next, specific examples of the voice guidance list 15, the playlist 25, and various content metadata will be described with reference to FIGS. FIG. 2 is a diagram illustrating an example of the voice guidance list 15. In the voice guidance list 15, “transmission time”, “importance”, and “group ID” are associated with each guidance voice. In FIG. 2, four guidance voices of “soon to the right”, “to the right”, “this is a road for a while”, and “3 o'clock” are illustrated. Each guidance voice is composed of one or more pieces of transmission information. For example, the guidance voice “coming soon” is composed of two pieces of transmission information “coming soon” and “coming right”. Each transmission information is associated with a “voice ID”.

Item “Communication time” indicates the transmission start time of the guidance voice. As described above, an audible sound is output before each guidance voice is output, and therefore, “transmission start time = output timing of the audible sound”. The item “importance” is classified into two levels of “importance 1” and “importance 0” according to the content of the guidance voice. “Importance 1” indicates guidance voice with high importance. For example, information necessary for the latest driving (such as guidance in the traveling direction guided within 500 m before the intersection) is set as “importance 1”. On the other hand, “importance 0” indicates guidance voice with low importance. For example, information that is not necessary for the most recent driving (guidance in the direction of travel that is guided at a position more than 500m from the front of the intersection, traffic jam information, route guidance that does not require a left / right turn, time information, etc.) is “importance 0” Set as Note that the importance of the guidance voice can be set at three or more levels instead of two.

The item “group ID” is set for each guidance voice, and means that one or more pieces of transmission information assigned with the same group ID are output continuously. Thereby, insertion of the transmission information provided with other group IDs can be prohibited by updating the voice guidance list 15 or the like. For example, if the transmission information of another group ID such as “3 o'clock” is inserted in the guidance voice “coming soon,” the meaning will be lost.

Subsequently, the playlist 25 will be described with reference to FIG. The playlist 25 associates “song order”, “song ID”, and “length” for each piece of music content. The item “song order” indicates the order in which songs are played. The item “song ID” is a code for identifying each piece of music content, and is an alphanumeric character represented by “M ******” so as not to overlap with other content. The item “length” indicates the song length in seconds.

Subsequently, the guidance voice metadata will be described with reference to FIG. In the guidance voice metadata, “voice ID”, “transmission information”, and “voice quality” are associated. The item “voice ID” is a code for identifying each guidance voice content, and is a number represented by “1 ***” so as not to overlap with other content. The item “transmission information” indicates the content of the guidance voice. The item “voice quality” is classified into “normal”, “quiet”, and “bright”, each corresponding to the last digit of “voice ID”. That is, the guidance voice content whose last digit of “voice ID” is “0” corresponds to the voice quality “normal”, and the guidance voice content whose last digit of “voice ID” is “1” is the voice quality “quiet”. , And the guidance voice content whose last digit of “voice ID” is “2” corresponds to voice quality “bright”. As described above, the guidance voice metadata provides three types of guidance voice contents for the same “transmission information”. Then, the audio output device 1 selects and outputs a guidance voice content having a voice quality that matches the tone of the music (high compatibility, harmony, and consistency) from among these three types of guidance voice contents.

Note that since the guidance voice content actually inserted into the song is undecided until the time of insertion, which of these three types is determined, the car navigation unit 10 assumes that the voice quality is “normal” and the voice guidance list 15 is created. Therefore, in the voice guidance list 15 shown in FIG. 2, all the last digits of the “voice quality ID” are “0”.

Next, the sounding sound metadata will be described with reference to FIG. In the sounding sound metadata, “sounding sound ID” having a fitness level 0 to a fitness level 5 is associated with each chord. The item “sounding sound ID” is a code for identifying each sounding sound content, and is a number represented by “2 ***” so as not to overlap with other content.

Here, “goodness of fit 0” means that the goodness of fit is the lowest for the associated chords. On the other hand, “goodness of fit 5” means that the best fit for the associated chord. For example, in the example shown in the figure, when the chord D and the arousal ID “20917” are heard at the same time, there is clearly a sense of incongruity. I feel comfortable. Therefore, when voice guidance with a high degree of importance is performed, if the chord of the song at that time is “D”, it is possible to attract the driver's attention strongly by sounding an arousing sound ID “20917” with a sense of incongruity. it can. Also, when performing voice guidance with low importance, if the chord of the song at that time is “D”, the possibility of hindering the music appreciation is reduced by sounding the arousal ID “20049” that matches the song. can do.

In the present embodiment, the sound content of “fitness 0” and “fitness 5” is selectively used according to the importance of the voice guidance in which the sound is used, but “fitness 1” and “fitness” 4 ”or“ compatibility 2 ”and“ compatibility 3 ”may be used in combination. In addition, the user may be able to set which fitness level is used.

Next, the song metadata will be described with reference to FIG. The song metadata is time-series data in which “tune” and “chord” are associated with each other. In the example of the figure, “musical tone” and “chord” are recorded at intervals of 0.1 seconds. And the tone that was "quiet" from 0.0 (start of song) to 1.4 seconds has changed to "bright" after 1.5 seconds, from 0.0 to 0.5 seconds Indicates a chord “C”, a chord “Dm7” from 0.6 to 1.3 seconds, and a chord “Gm” after 1.4 seconds. Therefore, for example, when voice information is inserted within 0.5 seconds from the start of the song, and the voice information is “highly important voice guidance”, the chord D of the song is After outputting a low-sounding arousing sound, a guidance voice that matches the tune is output.

Next, with reference to the flowchart in FIG. 7, a series of flow of audio output processing by the audio output device 1 will be described. Under the situation where a song is being played, first, when the voice information insertion unit 11 inserts voice information and information indicating its importance (S01), the voice information adjustment unit 30 The chord and melody are determined (S02). This determination is made by referring to the song metadata in the content metadata DB 41 based on the song ID acquired from the player unit 20 and information indicating the playback position (elapsed time from the start of the song). Note that the information indicating the playback position may be periodically acquired from the player unit 20, or only information indicating the start of playback may be acquired, and thereafter, the playback position may be specified by counting elapsed time. good.

Subsequently, the voice information adjustment unit 30 determines the importance of the voice information inserted in S01 (S03). Here, when the sound information adjustment unit 30 determines that the importance is high (S03: Yes), the sound information insertion start time is referred to the sounding sound metadata (see FIG. 5) in the content metadata DB 41. An arousing sound ID having a low fitness with respect to the chord of the song is selected (S04). On the other hand, if it is determined that the importance level is low (S03: No), a stimulating sound ID having a high matching degree with respect to the chord of the music at the start of the insertion of the sound information is selected from the sounding sound metadata (S05).

Subsequently, the audio information adjustment unit 30 refers to the guide audio metadata (see FIG. 4) in the content metadata DB 41, and selects a guide audio ID corresponding to the tune of the song at the start of audio information insertion (S06). ). For the guidance voice, a guidance voice ID suitable for the tone of the song is selected regardless of its importance. Then, the player unit 20 acquires the arousing sound ID and the guidance audio ID from the audio information adjustment unit 30, reads the corresponding content from the content DB 42, and outputs the arousing sound and the guidance audio (S07). Note that the player unit 20 may gradually decrease or increase the volume of the song before and after outputting the sounding sound and the guidance sound, or play the music while the sounding sound and the guidance sound are being output. It may be interrupted.

Here, following the above process, a specific example will be described. For example, as shown in FIG. 6, when the song content with the song ID “M23452” is played and the playback position is “0.4” seconds, the voice guidance transmission start time becomes “14:56:45”, When the guidance voice “Immediately right” is inserted (see FIG. 2), since the chord of the song at that time is “chord D”, the degree of fitness with respect to “chord D” Is generated (see FIG. 5), and the guidance voice (voice ID “15001” and voice ID “14001” suitable for the tone of the song “quiet” is output. ) Is output (see FIG. 4).

Note that the above flowchart assumes that the song is being played back, but if audio information is inserted when the audio player is stopped or between songs, the song is not being played. The audio information is not adjusted. That is, a predetermined predetermined sound ID is selected as the sound, and the sound ID corresponding to the voice quality “normal” is selected. Further, as for the arousing sound, one of the two types of arousing sound IDs may be selected according to the importance of the voice information.

As described above, according to the audio output device 1 of the present embodiment, in order to adjust the chord of the evoked sound and the voice quality of the guidance voice according to the sound element at the time of inserting the audio information of the tune being played back For example, by adjusting so that the fitness level is high, the audio information can be blended into the song, and comfortable music appreciation is not hindered. Further, by adjusting so that the fitness level is low, the audio information is not mixed with the music (music), and the audio information can be clearly transmitted to the driver. In addition, since the fitness is determined according to the importance of the audio information, the audio information with high importance can be clearly communicated to the driver, and the audio information with low importance is regarded as an obstacle to listening to music. The voice adjustment preferable for both the driver and the passenger can be performed.

In the above embodiment, the audio information is adjusted when the audio information is inserted. However, the audio information may be adjusted in advance. In this case, the voice information is adjusted based on the voice guidance list 15 generated in advance and the playlist 25 selected in advance, and the voice guidance list is reproduced before the music is reproduced based on the adjustment result. 15 is created. In this case, the voice guidance list 15 is preferably listed with a voice ID (one of the three types of “voice quality” selected) and a sounding sound ID. According to this configuration, it is only necessary to perform voice output based on the voice guidance list 15 (no need for voice adjustment), so the control load on the voice output device 1 during music reproduction can be reduced.

Further, in the above-described embodiment, the content to be adopted is changed according to the importance of the voice guidance for the arousing sound mentioned as an example of the voice information, but the guidance voice is also adopted according to the importance. The content may be changed. However, for guidance voices, if the importance level is high, selecting a content with a low fitness level may result in a combination of playing a "quiet" guidance voice when the song's tone is "bright". In that case, since the guidance voice is erased by the music, it is not necessary to simply select the content having a low fitness. For this reason, it is preferable to prepare a list that defines the voice quality of the optimum guidance voice according to the importance for the type of tune of the music as the guidance voice metadata.

In the above-described embodiment, the audible sound is output before all the guidance voices. However, the audible sound may be output only before the guidance voices having high importance. Further, the user may be able to set whether or not to add a rousing sound to the guidance voice, whether or not to add a rousing sound depending on the importance, and the like.

In the above embodiment, the audio output device 1 includes the content DB 42. However, these may be omitted. In this case, the audio output device 1 appropriately acquires content from the external device that stores the content DB 42, and performs music reproduction and audio guidance.

In the above embodiment, the audio output device 1 includes the car navigation unit 10 and the player unit 20, but either one or both may be omitted. For example, when both are omitted, audio information is acquired from a car navigation device that is an external device, and the audio information is adjusted to be inserted into a song that is being played back by an audio player that is an external device. Is output to the audio player.

In the above embodiment, the “song” includes a sound element (musical tone, chord), and the “voice information” includes a sound element (sounding chord), a voice element (guidance of the guidance voice), and However, the present invention is not limited to this. For example, a “song” may include a voice element, and the sound element of “voice information” may be adjusted accordingly. In other words, adjust the voice element of “voice information” according to the sound element of “song”, or adjust the sound element of “voice information” according to the voice element of “song” For example, both elements do not necessarily match. Further, in the case of “voice information” including sound and voice, both voice and sound may be output simultaneously instead of a pattern in which voice is added after the sound as in the present embodiment.

In addition, as an example of the voice information, “arousing sound” has been described as an example, but a sound with an image such as “warning sound” may be repeated. Moreover, it may be a sound including a melody of several measures, such as a train arrival sound. That is, various sound effects can be applied as “voice information” including sound.

In addition, although “musical tone” and “chord” are illustrated as sound elements, other elements such as “rhythm (rhythm, periodicity)” and “sound source direction” may be added. “Voice quality” has been exemplified as a voice element, but “pitch (voice pitch)”, “voice volume (voice volume, strength, width)”, “pronunciation”, “voice reverberation”, etc. Other elements may be added. In other words, the voice element of the voice information is adjusted according to the “rhythm” of the song, the “pitch” of the vocal, or the “rhythm” or “pitch” of the voice information is adjusted according to the voice element of the song. May be.

In the above embodiment, the audio information is adjusted by selecting one audio information from a plurality of types of audio information. However, the sound and / or voice of a song at the time of inserting the audio information is changed. Utilizing the sound information, the sound and / or voice of the sound information may be generated when the sound information is inserted. According to this configuration, the storage capacity for storing multiple types of audio information can be reduced, and the sound and / or voice of the audio information can be generated using the sound and / or voice of the song being played. Therefore, a variety of audio information can be output. In addition, as an example of using the sound of the song being played, combining the sounds that make up the song to generate a sound with high suitability, or the sound that makes up the song is shifted by a semitone A method of generating an arousing sound with a low degree of fitness by combining them.

In the above embodiment, the sound and / or voice element of the sound information is adjusted according to the sound and / or voice element of the song at the start of the insertion of the sound information. If the sound and / or voice elements of a song change during the playback of audio information, the sound and / or voice elements of the audio information may be adjusted accordingly. good. In addition, if the length of the audio information is known in advance, if the sound and / or voice elements of the song change during the playback of the audio information, The sound information may be adjusted according to the sound and / or voice elements, or the sound information may be adjusted according to the music sound and / or voice elements at the end of the insertion of the sound information.

As an application example of the audio output device 1 of the present invention, the sound source and / or language of the audio information may be adjusted according to the genre of the music being reproduced. In this case, for example, by adjusting the genre of music so that the degree of adaptation is high, the possibility that the audio information hinders music appreciation can be reduced. Further, by adjusting so that the fitness level is low, the audio information is not mixed with the music (music), and the audio information can be clearly communicated to the audience. The “song genre” indicates a type such as Western music or Japanese music, a type such as classic or jazz, a type such as movie music or CM music. The “sound source” refers to a device that generates sound, such as a musical instrument to be played. As a specific example in which the degree of fitness is high, there is a method in which the guidance voice is changed to English when the music is Western music, and the guidance voice is changed to Japanese when the music is Japanese music. As for the arousal sound, there is a method in which the arousal sound is a “koto” tone when the song is an enka, and an “electric guitar” tone when the song is rock.

In the above embodiment, the in-vehicle audio output device 1 has been exemplified. However, in a broadcasting station that continuously reproduces music (music), such as cable broadcasting, a time signal or traffic information may be inserted. The present invention can be applied. In this case, the audio information such as the time signal and traffic information can be adjusted according to the tune and chord of the song at the start of insertion of the time signal and traffic information. The present invention can be applied to any device as long as it is a device that provides voice guidance under the circumstances where a song is being reproduced.

Also, the audio output device 1 of the present invention may be applied to video. For example, although one-segment broadcasting has been attracting attention in recent years, the sound of audio information and / or the voice of the voice information is analyzed so as to increase or decrease the fitness according to the analysis result of those images. Elements may be adjusted. In this case, the elements of the image (video) include brightness, occupancy of each color, resolution, contrast, genre (animation, live action, etc.) and the like.

Also, it is possible to provide each unit in the audio output device shown in the above embodiment and application examples as a program. Further, the program can be provided by being stored in a recording medium (not shown). That is, a program for causing a computer to function as each unit of the audio output device and a recording medium recording the program are also included in the scope of the right of the present invention. Other modifications can be made as appropriate without departing from the scope of the present invention.

Claims

Voice information insertion means for inserting voice information which is a guidance voice and / or a sound effect during the reproduction of a song;
Voice information adjusting means for adjusting the sound and / or voice element of the voice information according to the sound and / or voice element at the time of insertion of the voice information of the song being played;
An audio output device comprising: audio output means for outputting audio based on the audio information adjusted by the audio information adjustment means.
The sound information adjusting means may adjust the sound and / or voice elements of the sound information so that the fitness level is high or the fitness level is low with respect to the sound and / or voice elements of the song. The audio output device according to claim 1, wherein the audio output device is adjusted.
The audio information has an importance set according to its content,
The voice information adjusting means adjusts the sound and / or voice elements of the voice information so that the degree of fitness of the voice information having high importance is low with respect to the sound and / or voice elements of the song. The sound information and / or voice element of the sound information is adjusted so that the degree of fitness of the sound information with low importance is high with respect to the sound and / or voice element of the song. Item 3. The audio output device according to Item 2.
Metadata storage means for storing song metadata that is information relating to the sound and / or voice elements of the song and voice information metadata that is information relating to the sound and / or voice elements of the voice information is further provided. ,
The audio output device according to claim 1, wherein the audio information adjustment unit adjusts a sound and / or a voice element of the audio information with reference to the music metadata and the audio information metadata. .
Voice information storage means for storing a plurality of types of voice information having different sound and / or voice elements;
The voice information adjusting means outputs an object to be output from among a plurality of types of the voice information stored in the voice information storage means according to the sound and / or voice element of the song when the voice information is inserted. The audio output device according to claim 1, wherein one audio information is selected.
The voice information adjusting means uses the sound and / or voice of the song at the time of inserting the voice information, and generates the sound and / or voice of the voice information at the time of inserting the voice information. The audio output device according to claim 1.
The sound information adjusting means adjusts a sound and / or voice element of the sound information according to a sound and / or voice element of the song at the start of insertion of the sound information. 2. The audio output device according to 1.
The sound element includes one or more elements of tune, chord, and rhythm, and the voice element includes any one or more elements of pitch, volume, voice quality, and pronunciation. The audio output device according to claim 1.
It further comprises song playback means for playing back the song,
The audio output device according to claim 1, wherein the audio output unit outputs the music reproduced by the music reproduction unit together with a sound and / or a voice based on the audio information.
Voice information insertion means for inserting voice information which is a guidance voice and / or a sound effect during the reproduction of a song;
Audio information adjusting means for adjusting the sound source and / or language of the audio information according to the genre of the song being played when the audio information is inserted;
An audio output device comprising: audio output means for outputting audio based on the audio information adjusted by the audio information adjustment means.
A program for causing a computer to function as each means in the audio output device according to any one of claims 1 to 10.