WO2009139022A1 - Audio output device and program - Google Patents

Audio output device and program Download PDF

Info

Publication number
WO2009139022A1
WO2009139022A1 PCT/JP2008/001216 JP2008001216W WO2009139022A1 WO 2009139022 A1 WO2009139022 A1 WO 2009139022A1 JP 2008001216 W JP2008001216 W JP 2008001216W WO 2009139022 A1 WO2009139022 A1 WO 2009139022A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
information
sound
audio
song
Prior art date
Application number
PCT/JP2008/001216
Other languages
French (fr)
Japanese (ja)
Inventor
児玉泰輝
莪山真一
Original Assignee
パイオニア株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by パイオニア株式会社 filed Critical パイオニア株式会社
Priority to JP2010511789A priority Critical patent/JPWO2009139022A1/en
Priority to PCT/JP2008/001216 priority patent/WO2009139022A1/en
Publication of WO2009139022A1 publication Critical patent/WO2009139022A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/46Volume control
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/361Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems
    • G10H1/366Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems with means for modifying or correcting the external signal, e.g. pitch correction, reverberation, changing a singer's voice
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/075Musical metadata derived from musical analysis or for use in electrophonic musical instruments
    • G10H2240/081Genre classification, i.e. descriptive metadata for classification or selection of musical pieces according to style
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/025Envelope processing of music signals in, e.g. time domain, transform domain or cepstrum domain
    • G10H2250/035Crossfade, i.e. time domain amplitude envelope control of the transition between musical sounds or melodies, obtained for musical purposes, e.g. for ADSR tone generation, articulations, medley, remix
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/315Sound category-dependent sound synthesis processes [Gensound] for musical use; Sound category-specific synthesis-controlling parameters or control means therefor
    • G10H2250/455Gensound singing voices, i.e. generation of human voices for musical applications, vocal singing sounds or intelligible words at a desired pitch or with desired vocal effects, e.g. by phoneme synthesis

Definitions

  • the present invention relates to an audio output device and a program for inserting and outputting audio information during reproduction of a song.
  • a navigation system having a car navigation function and an audio player function and performing route guidance by inserting voice information during reproduction of a song is known (for example, Patent Document 1).
  • This navigation system discriminates the priority of voice guidance, and when the priority is high, the music reproduction is interrupted and voice guidance is inserted. If the priority is low, voice guidance is inserted after the end of the music being played back. With this configuration, voice guidance that is not so important for the driver can be performed between the songs, and there is an effect that the music being played is not interrupted more than necessary.
  • JP 2001-116581 A JP 2001-116581 A
  • an object of the present invention is to provide an audio output device and a program capable of inserting audio information according to a song being played, for example, so as not to disturb music appreciation as much as possible. .
  • the audio output device of the present invention includes audio information insertion means for inserting audio information which is a guidance voice and / or sound effect during reproduction of a song, and sound and / or Or, according to the voice element, the voice information adjusting means for adjusting the sound of the voice information and / or the voice element, and the voice output means for outputting the voice based on the voice information adjusted by the voice information adjusting means, It is provided with.
  • the audio information adjustment unit may adjust the sound of the audio information and / or the sound of the music and / or the voice element so that the fitness is high or the fitness is low. It is preferable to adjust the voice component.
  • the sound and / or voice element in order to adjust the sound and / or voice element of the sound information according to the sound and / or voice element at the time of inserting the sound information of the music (musical piece) being reproduced, By adjusting the sound and / or voice elements so that the degree of adaptation is high, it is possible to reduce the possibility that the sound information hinders music appreciation. In addition, by adjusting so that the fitness level is low, the audio information is not confused with the song, and the audio information can be clearly transmitted to the audience.
  • the “sound and / or voice element” means “at least one of a sound element and a voice element”. Further, the “song” and “speech information” need only include either sound or voice, and do not necessarily include both.
  • both elements do not necessarily match.
  • voice information including both sound and voice
  • both may be output at the same time, or both are output separated in time, for example, voice is added after the sound. It may be a thing.
  • the “sound effect” is a concept including an arousing sound and a warning sound.
  • the means for reproducing the music may be provided in the audio output device or in an external device other than the audio output device. In the latter case, the audio output device may acquire a playlist of songs from an external device in advance, and perform audio adjustment based on the playlist. Further, the sound adjustment may be performed in real time while acquiring the sound signal of the music being reproduced.
  • the importance of the audio information is set in accordance with the content thereof, and the audio information adjustment unit is configured to use the sound of the song and / or the voice element for the audio information having a high importance.
  • the sound and / or voice elements of the audio information are adjusted so that the degree of adaptation is low with respect to the voice information. It is preferable to adjust the sound and / or the voice element of the voice information.
  • music metadata that is information related to the sound and / or voice elements of music
  • audio information metadata that is information related to the sound and / or voice elements of the audio information are stored. It is preferable to further include metadata storage means, and the sound information adjustment means adjusts the sound and / or voice elements of the sound information with reference to the song metadata and the sound information metadata.
  • the voice output device described above further includes voice information storage means for storing a plurality of types of voice information having different sound and / or voice elements, and the voice information adjustment means includes the sound of the tune when the voice information is inserted It is preferable to select one piece of audio information to be output from among a plurality of types of audio information stored in the audio information storage unit according to the voice element.
  • the audio information adjusting means uses the sound and / or voice of the song at the time of inserting the audio information, and uses the sound and / or voice of the audio information at the time of inserting the audio information. It is preferable to produce.
  • the audio information adjusting means may adjust the sound and / or voice element of the audio information in accordance with the sound and / or voice element of the song at the start of insertion of the audio information. preferable.
  • the voice information has a time length
  • the sound and / or voice elements change in the middle of the song. Since the voice adjustment can be performed together, it is possible to cope with the case where the time length of the voice information is not defined in advance.
  • the sound element includes one or more elements of tune, chord, and rhythm
  • the voice element includes any one or more elements of pitch, volume, voice quality, and pronunciation It is preferable to contain.
  • those elements of the voice information can be adjusted according to the tone, chord, rhythm, voice pitch, voice volume, voice quality, and pronunciation included in the song. For example, when the music is in a quiet tone, the possibility of hindering the music appreciation can be reduced by inserting a guidance voice with a quiet voice quality. Also, when the music is quiet, the voice information can be clearly communicated to the audience by inserting a large volume of guidance voice.
  • the audio output device further includes a music reproducing unit that reproduces the music, and the audio output unit outputs the music reproduced by the music reproducing unit together with the sound and / or voice based on the audio information.
  • Another audio output device is adapted to insert audio information that is guide voice and / or sound effect during reproduction of a song, and according to the genre of the song that is being reproduced when the audio information is inserted.
  • Voice information adjusting means for adjusting the sound source and / or language of the voice information, and voice output means for outputting voice based on the voice information adjusted by the voice information adjusting means. .
  • the sound source and / or language of the audio information in order to adjust the sound source and / or language of the audio information according to the genre of the song being played, for example, by adjusting so that the degree of fitness is high with respect to the genre of the song, It is possible to reduce the possibility that the audio information hinders music appreciation. Further, by adjusting so that the fitness level is low, the audio information is not mixed with the music (music), and the audio information can be clearly communicated to the audience.
  • the “song genre” indicates a type such as Western music or Japanese music, a type such as classic or jazz, a type such as movie music or CM music.
  • the “sound source” refers to a device that generates sound, such as a musical instrument to be played.
  • the program of the present invention is a program for causing a computer to function as each means in the above-described audio output device.
  • an audio output device and a program according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings.
  • an in-vehicle audio output device that has a car navigation function and an audio player function and inserts audio information during the reproduction of a song is exemplified.
  • FIG. 1 is a block diagram showing a control configuration of the audio output device 1.
  • the audio output device 1 includes a car navigation unit 10 that controls a car navigation function, a player unit 20 that controls an audio player function, and an audio information adjustment unit 30 that adjusts audio information for performing car navigation.
  • the car navigation unit 10 is configured to provide route guidance (road information) based on a route or destination set by a user (driver) and GPS information received from a GPS (Global Positioning System) receiver. Guidance). It also obtains road traffic information and provides traffic information regarding traffic jam information and traffic regulations. Therefore, although not particularly illustrated, the car navigation unit 10 includes the GPS receiver, a control program for performing route guidance, and a display for performing route display.
  • route guidance road information
  • GPS Global Positioning System
  • the car navigation unit 10 has a voice information insertion unit 11.
  • the voice information insertion unit 11 includes voice information (guidance voice for performing route guidance and traffic guidance by voice, and arousing sound output for attracting the driver's attention before outputting the guidance voice. ) Is output to the audio information adjustment unit 30 so as to be inserted into the music being reproduced by the player unit 20.
  • the voice information insertion unit 11 inserts voice information according to the voice guidance list 15 (see FIG. 2) created in advance in the car navigation unit 10. Note that the voice guidance list 15 is updated in real time according to a situation that changes every moment (such as a traveling speed of a vehicle on which the voice output device 1 is mounted and a road situation).
  • the player unit 20 includes a song reproduction unit 21 that reproduces a song according to a playlist 25 (see FIG. 3) selected by the user, a song reproduced by the song reproduction unit 21, and a voice inserted by the audio information insertion unit 11.
  • a voice output unit 22 that outputs voice (sound and voice) based on the information.
  • the player unit 20 includes an audio control device and a speaker for performing various audio processes.
  • the sound information adjusting unit 30 is configured to change the sound and voice of the sound information according to the sound and voice elements at the time when the sound information is inserted by the sound information inserting unit 11 of the song being reproduced by the song reproducing unit 21. It adjusts elements, and has a guidance voice adjustment unit 31 and a rousing sound adjustment unit 32.
  • the guidance voice adjusting unit 31 “voice quality (voice color, tone of voice)” that is a voice element of voice information (guidance voice) according to “musical tone (melody)” that is an element of the sound of a song. ”.
  • the arousal adjustment unit 32 adjusts a chord that is a sound element of sound information (arousal sound) in accordance with a chord that is a sound element of the music. Furthermore, the arousing sound adjustment unit 32 adjusts the arousing sound in consideration of the importance of the arousing sound (in the case of the present embodiment, the importance of the following guidance voice). A specific adjustment method will be described later.
  • FIG. 2 is a diagram illustrating an example of the voice guidance list 15.
  • “transmission time”, “importance”, and “group ID” are associated with each guidance voice.
  • FIG. 2 four guidance voices of “soon to the right”, “to the right”, “this is a road for a while”, and “3 o'clock” are illustrated.
  • Each guidance voice is composed of one or more pieces of transmission information.
  • the guidance voice “coming soon” is composed of two pieces of transmission information “coming soon” and “coming right”.
  • Each transmission information is associated with a “voice ID”.
  • the item “importance” is classified into two levels of “importance 1” and “importance 0” according to the content of the guidance voice. “Importance 1” indicates guidance voice with high importance. For example, information necessary for the latest driving (such as guidance in the traveling direction guided within 500 m before the intersection) is set as “importance 1”. On the other hand, “importance 0” indicates guidance voice with low importance.
  • information that is not necessary for the most recent driving is “importance 0” Set as Note that the importance of the guidance voice can be set at three or more levels instead of two.
  • the item “group ID” is set for each guidance voice, and means that one or more pieces of transmission information assigned with the same group ID are output continuously. Thereby, insertion of the transmission information provided with other group IDs can be prohibited by updating the voice guidance list 15 or the like. For example, if the transmission information of another group ID such as “3 o'clock” is inserted in the guidance voice “coming soon,” the meaning will be lost.
  • the playlist 25 associates “song order”, “song ID”, and “length” for each piece of music content.
  • the item “song order” indicates the order in which songs are played.
  • the item “song ID” is a code for identifying each piece of music content, and is an alphanumeric character represented by “M ******” so as not to overlap with other content.
  • the item “length” indicates the song length in seconds.
  • the item “voice ID” is a code for identifying each guidance voice content, and is a number represented by “1 ***” so as not to overlap with other content.
  • the item “transmission information” indicates the content of the guidance voice.
  • the item “voice quality” is classified into “normal”, “quiet”, and “bright”, each corresponding to the last digit of “voice ID”. That is, the guidance voice content whose last digit of “voice ID” is “0” corresponds to the voice quality “normal”, and the guidance voice content whose last digit of “voice ID” is “1” is the voice quality “quiet”.
  • the guidance voice metadata provides three types of guidance voice contents for the same “transmission information”. Then, the audio output device 1 selects and outputs a guidance voice content having a voice quality that matches the tone of the music (high compatibility, harmony, and consistency) from among these three types of guidance voice contents.
  • the car navigation unit 10 assumes that the voice quality is “normal” and the voice guidance list 15 is created. Therefore, in the voice guidance list 15 shown in FIG. 2, all the last digits of the “voice quality ID” are “0”.
  • sounding sound ID having a fitness level 0 to a fitness level 5 is associated with each chord.
  • the item “sounding sound ID” is a code for identifying each sounding sound content, and is a number represented by “2 ***” so as not to overlap with other content.
  • “goodness of fit 0” means that the goodness of fit is the lowest for the associated chords.
  • “goodness of fit 5” means that the best fit for the associated chord.
  • voice guidance with a high degree of importance is performed, if the chord of the song at that time is “D”, it is possible to attract the driver's attention strongly by sounding an arousing sound ID “20917” with a sense of incongruity. it can.
  • voice guidance with low importance if the chord of the song at that time is “D”, the possibility of hindering the music appreciation is reduced by sounding the arousal ID “20049” that matches the song. can do.
  • the sound content of “fitness 0” and “fitness 5” is selectively used according to the importance of the voice guidance in which the sound is used, but “fitness 1” and “fitness” 4 ”or“ compatibility 2 ”and“ compatibility 3 ”may be used in combination.
  • the user may be able to set which fitness level is used.
  • the song metadata is time-series data in which “tune” and “chord” are associated with each other.
  • “musical tone” and “chord” are recorded at intervals of 0.1 seconds.
  • the tone that was "quiet” from 0.0 (start of song) to 1.4 seconds has changed to "bright” after 1.5 seconds, from 0.0 to 0.5 seconds
  • the chord D of the song is After outputting a low-sounding arousing sound, a guidance voice that matches the tune is output.
  • the voice information insertion unit 11 inserts voice information and information indicating its importance (S01)
  • the voice information adjustment unit 30 The chord and melody are determined (S02). This determination is made by referring to the song metadata in the content metadata DB 41 based on the song ID acquired from the player unit 20 and information indicating the playback position (elapsed time from the start of the song). Note that the information indicating the playback position may be periodically acquired from the player unit 20, or only information indicating the start of playback may be acquired, and thereafter, the playback position may be specified by counting elapsed time. good.
  • the voice information adjustment unit 30 determines the importance of the voice information inserted in S01 (S03).
  • the sound information adjustment unit 30 determines that the importance is high (S03: Yes)
  • the sound information insertion start time is referred to the sounding sound metadata (see FIG. 5) in the content metadata DB 41.
  • An arousing sound ID having a low fitness with respect to the chord of the song is selected (S04).
  • a stimulating sound ID having a high matching degree with respect to the chord of the music at the start of the insertion of the sound information is selected from the sounding sound metadata (S05).
  • the audio information adjustment unit 30 refers to the guide audio metadata (see FIG. 4) in the content metadata DB 41, and selects a guide audio ID corresponding to the tune of the song at the start of audio information insertion (S06). ).
  • a guidance voice ID suitable for the tone of the song is selected regardless of its importance.
  • the player unit 20 acquires the arousing sound ID and the guidance audio ID from the audio information adjustment unit 30, reads the corresponding content from the content DB 42, and outputs the arousing sound and the guidance audio (S07).
  • the player unit 20 may gradually decrease or increase the volume of the song before and after outputting the sounding sound and the guidance sound, or play the music while the sounding sound and the guidance sound are being output. It may be interrupted.
  • a predetermined predetermined sound ID is selected as the sound, and the sound ID corresponding to the voice quality “normal” is selected.
  • the arousing sound one of the two types of arousing sound IDs may be selected according to the importance of the voice information.
  • the audio output device 1 of the present embodiment in order to adjust the chord of the evoked sound and the voice quality of the guidance voice according to the sound element at the time of inserting the audio information of the tune being played back For example, by adjusting so that the fitness level is high, the audio information can be blended into the song, and comfortable music appreciation is not hindered. Further, by adjusting so that the fitness level is low, the audio information is not mixed with the music (music), and the audio information can be clearly transmitted to the driver. In addition, since the fitness is determined according to the importance of the audio information, the audio information with high importance can be clearly communicated to the driver, and the audio information with low importance is regarded as an obstacle to listening to music. The voice adjustment preferable for both the driver and the passenger can be performed.
  • the audio information is adjusted when the audio information is inserted.
  • the audio information may be adjusted in advance.
  • the voice information is adjusted based on the voice guidance list 15 generated in advance and the playlist 25 selected in advance, and the voice guidance list is reproduced before the music is reproduced based on the adjustment result. 15 is created.
  • the voice guidance list 15 is preferably listed with a voice ID (one of the three types of “voice quality” selected) and a sounding sound ID. According to this configuration, it is only necessary to perform voice output based on the voice guidance list 15 (no need for voice adjustment), so the control load on the voice output device 1 during music reproduction can be reduced.
  • the content to be adopted is changed according to the importance of the voice guidance for the arousing sound mentioned as an example of the voice information, but the guidance voice is also adopted according to the importance.
  • the content may be changed.
  • the importance level is high, selecting a content with a low fitness level may result in a combination of playing a "quiet" guidance voice when the song's tone is "bright".
  • the guidance voice is erased by the music, it is not necessary to simply select the content having a low fitness. For this reason, it is preferable to prepare a list that defines the voice quality of the optimum guidance voice according to the importance for the type of tune of the music as the guidance voice metadata.
  • the audible sound is output before all the guidance voices.
  • the audible sound may be output only before the guidance voices having high importance.
  • the user may be able to set whether or not to add a rousing sound to the guidance voice, whether or not to add a rousing sound depending on the importance, and the like.
  • the audio output device 1 includes the content DB 42. However, these may be omitted. In this case, the audio output device 1 appropriately acquires content from the external device that stores the content DB 42, and performs music reproduction and audio guidance.
  • the audio output device 1 includes the car navigation unit 10 and the player unit 20, but either one or both may be omitted. For example, when both are omitted, audio information is acquired from a car navigation device that is an external device, and the audio information is adjusted to be inserted into a song that is being played back by an audio player that is an external device. Is output to the audio player.
  • the “song” includes a sound element (musical tone, chord), and the “voice information” includes a sound element (sounding chord), a voice element (guidance of the guidance voice), and
  • a “song” may include a voice element, and the sound element of “voice information” may be adjusted accordingly.
  • adjust the voice element of “voice information” according to the sound element of “song”, or adjust the sound element of “voice information” according to the voice element of “song” For example, both elements do not necessarily match.
  • both voice and sound may be output simultaneously instead of a pattern in which voice is added after the sound as in the present embodiment.
  • voice information “arousing sound” has been described as an example, but a sound with an image such as “warning sound” may be repeated. Moreover, it may be a sound including a melody of several measures, such as a train arrival sound. That is, various sound effects can be applied as “voice information” including sound.
  • musical tone” and “chord” are illustrated as sound elements, other elements such as “rhythm (rhythm, periodicity)” and “sound source direction” may be added.
  • “Voice quality” has been exemplified as a voice element, but “pitch (voice pitch)”, “voice volume (voice volume, strength, width)”, “pronunciation”, “voice reverberation”, etc. Other elements may be added.
  • the voice element of the voice information is adjusted according to the “rhythm” of the song, the “pitch” of the vocal, or the “rhythm” or “pitch” of the voice information is adjusted according to the voice element of the song. May be.
  • the audio information is adjusted by selecting one audio information from a plurality of types of audio information.
  • the sound and / or voice of a song at the time of inserting the audio information is changed. Utilizing the sound information, the sound and / or voice of the sound information may be generated when the sound information is inserted.
  • the storage capacity for storing multiple types of audio information can be reduced, and the sound and / or voice of the audio information can be generated using the sound and / or voice of the song being played. Therefore, a variety of audio information can be output.
  • the sound and / or voice element of the sound information is adjusted according to the sound and / or voice element of the song at the start of the insertion of the sound information. If the sound and / or voice elements of a song change during the playback of audio information, the sound and / or voice elements of the audio information may be adjusted accordingly. good. In addition, if the length of the audio information is known in advance, if the sound and / or voice elements of the song change during the playback of the audio information, The sound information may be adjusted according to the sound and / or voice elements, or the sound information may be adjusted according to the music sound and / or voice elements at the end of the insertion of the sound information.
  • the sound source and / or language of the audio information may be adjusted according to the genre of the music being reproduced.
  • the genre of music so that the degree of adaptation is high, the possibility that the audio information hinders music appreciation can be reduced.
  • the audio information is not mixed with the music (music), and the audio information can be clearly communicated to the audience.
  • the “song genre” indicates a type such as Western music or Japanese music, a type such as classic or jazz, a type such as movie music or CM music.
  • the “sound source” refers to a device that generates sound, such as a musical instrument to be played.
  • the degree of fitness is high
  • the guidance voice is changed to English when the music is Western music
  • the guidance voice is changed to Japanese when the music is Japanese music.
  • the arousal sound there is a method in which the arousal sound is a “koto” tone when the song is an enka, and an “electric guitar” tone when the song is rock.
  • the in-vehicle audio output device 1 has been exemplified.
  • a time signal or traffic information may be inserted.
  • the present invention can be applied.
  • the audio information such as the time signal and traffic information can be adjusted according to the tune and chord of the song at the start of insertion of the time signal and traffic information.
  • the present invention can be applied to any device as long as it is a device that provides voice guidance under the circumstances where a song is being reproduced.
  • the audio output device 1 of the present invention may be applied to video.
  • video For example, although one-segment broadcasting has been attracting attention in recent years, the sound of audio information and / or the voice of the voice information is analyzed so as to increase or decrease the fitness according to the analysis result of those images. Elements may be adjusted.
  • the elements of the image (video) include brightness, occupancy of each color, resolution, contrast, genre (animation, live action, etc.) and the like.
  • each unit in the audio output device shown in the above embodiment and application examples can be provided as a program.
  • the program can be provided by being stored in a recording medium (not shown). That is, a program for causing a computer to function as each unit of the audio output device and a recording medium recording the program are also included in the scope of the right of the present invention. Other modifications can be made as appropriate without departing from the scope of the present invention.

Abstract

An audio output device capable of inserting audio information in accordance with music during playback so as to prevent from obstructing music appreciation as much as possible is provided. An audio output device (1) comprises an audio information inserting section (11) for inserting audio information which is a guidance voice and/or a sound effect during the playback of music, an audio information adjusting section (30) for adjusting the elements of the sound and/or voice of the audio information according to the elements of the sound and/or voice during the insertion of the audio information of the music being played back, and an audio output section (22) for outputting audio on the basis of the audio information adjusted by the audio information adjusting section (30).

Description

音声出力装置およびプログラムAudio output device and program
 本発明は、曲の再生中に、音声情報を挿入して出力する音声出力装置およびプログラムに関する。 The present invention relates to an audio output device and a program for inserting and outputting audio information during reproduction of a song.
 従来、カーナビゲーション機能とオーディオプレーヤ機能とを有し、曲の再生中に、音声情報を挿入して道案内を行うナビゲーションシステムが知られている(例えば、特許文献1)。このナビゲーションシステムは、音声案内の優先度を判別し、優先度が高い場合は、曲再生を中断して音声案内を挿入する。また、優先度が低い場合は、再生中の曲の終了を待って音声案内を挿入する。この構成により、ドライバーにとってそれ程重要でない音声案内を曲間に行うことができ、再生中の曲を必要以上に中断させることがない、といった効果を奏する。
特開2001-116581号公報
2. Description of the Related Art Conventionally, a navigation system having a car navigation function and an audio player function and performing route guidance by inserting voice information during reproduction of a song is known (for example, Patent Document 1). This navigation system discriminates the priority of voice guidance, and when the priority is high, the music reproduction is interrupted and voice guidance is inserted. If the priority is low, voice guidance is inserted after the end of the music being played back. With this configuration, voice guidance that is not so important for the driver can be performed between the songs, and there is an effect that the music being played is not interrupted more than necessary.
JP 2001-116581 A
 ところが、実際の用途を考慮すると、上記のナビゲーションシステムでは、殆どの音声案内が曲再生を中断して挿入されてしまう。例えば、カーナビゲーションでは、「300メートル先右折です。」、「まもなく右方向です。」、「右です。」など、実際に右折を行う前に複数回の音声案内が行われることが多い。上記のナビゲーションシステムでは、これらの音声案内は全て「優先度が高い」と判別されるため、曲再生が中断されてしまう。このような音声案内は、ドライバーにとっては重要かもしれないが、同乗者にとっては重要でない場合が多く、不快な思いをさせてしまう。また、ドライバーにとっても、音声案内を確認しつつも、できるだけ快適に音楽鑑賞できることが望ましい。 However, in consideration of the actual application, in the above navigation system, most of the voice guidance is inserted by interrupting the music playback. For example, in car navigation, voice guidance is often performed several times before actually making a right turn, such as “It is a right turn 300 meters ahead”, “Soon to the right”, “It is right”. In the above navigation system, since these voice guidances are all determined to be “high priority”, the music reproduction is interrupted. Such voice guidance may be important for the driver, but it is often not important for the passengers, which makes it uncomfortable. It is also desirable for the driver to be able to enjoy music as comfortably as possible while confirming voice guidance.
 本発明は、上記の問題点に鑑み、できるだけ音楽鑑賞を妨げないようにするなど、再生中の曲に応じた音声情報を挿入することができる音声出力装置およびプログラムを提供することを目的とする。 In view of the above problems, an object of the present invention is to provide an audio output device and a program capable of inserting audio information according to a song being played, for example, so as not to disturb music appreciation as much as possible. .
 本発明の音声出力装置は、曲の再生中に、案内音声および/または効果音である音声情報を挿入する音声情報挿入手段と、再生されている曲の、音声情報の挿入時における音および/または声の要素に応じて、音声情報の音および/または声の要素を調整する音声情報調整手段と、音声情報調整手段による調整後の音声情報に基づいて、音声を出力する音声出力手段と、を備えたことを特徴とする。 The audio output device of the present invention includes audio information insertion means for inserting audio information which is a guidance voice and / or sound effect during reproduction of a song, and sound and / or Or, according to the voice element, the voice information adjusting means for adjusting the sound of the voice information and / or the voice element, and the voice output means for outputting the voice based on the voice information adjusted by the voice information adjusting means, It is provided with.
 上記に記載の音声出力装置において、音声情報調整手段は、曲の音および/または声の要素に対して、適合度が高くなるように、または適合度が低くなるように、音声情報の音および/または声の要素を調整することが好ましい。 In the above-described audio output device, the audio information adjustment unit may adjust the sound of the audio information and / or the sound of the music and / or the voice element so that the fitness is high or the fitness is low. It is preferable to adjust the voice component.
 これらの構成によれば、再生されている曲(楽曲)の、音声情報の挿入時における音および/または声の要素に応じて、音声情報の音および/または声の要素を調整するため、例えば曲の音および/または声の要素に対して、適合度が高くなるように調整することで、音声情報が音楽鑑賞の妨げとなる可能性を低くすることができる。また、適合度が低くなるように調整することで、音声情報が曲に紛れてしまうことがなく、聴衆者に対して明確に音声情報を伝えることができる。
 なお、「音および/または声の要素」とは、「音の要素および声の要素の少なくとも一方」を意味する。
 また、「曲」および「音声情報」は、いずれも音および声のいずれかが含まれていればよく、必ずしも両方が含まれている必要はない。また、「曲」の音の要素に応じて、「音声情報」の声の要素を調整したり、「曲」の声の要素に応じて、「音声情報」の音の要素を調整したりするなど、必ずしも両者の要素が一致する必要はない。また、音および声の両方を含む「音声情報」の場合、両者が同時に出力されるものであっても良いし、音の後に声が追加されるなど、両者が時間的に分離して出力されるものであっても良い。また、「効果音」とは、喚起音や警告音などを含む概念である。
 また、曲を再生する手段は、音声出力装置内に設けても良いし、音声出力装置以外の外部装置内に設けても良い。後者の場合は、音声出力装置が外部装置から曲のプレイリストを予め取得し、当該プレイリストに基づいて音声調整を行っても良い。また、再生されている曲の音声信号を取得しながら、リアルタイムに音声調整を行っても良い。
According to these configurations, in order to adjust the sound and / or voice element of the sound information according to the sound and / or voice element at the time of inserting the sound information of the music (musical piece) being reproduced, By adjusting the sound and / or voice elements so that the degree of adaptation is high, it is possible to reduce the possibility that the sound information hinders music appreciation. In addition, by adjusting so that the fitness level is low, the audio information is not confused with the song, and the audio information can be clearly transmitted to the audience.
The “sound and / or voice element” means “at least one of a sound element and a voice element”.
Further, the “song” and “speech information” need only include either sound or voice, and do not necessarily include both. Also, adjust the voice element of “voice information” according to the sound element of “song”, or adjust the sound element of “voice information” according to the voice element of “song” For example, both elements do not necessarily match. In addition, in the case of “voice information” including both sound and voice, both may be output at the same time, or both are output separated in time, for example, voice is added after the sound. It may be a thing. The “sound effect” is a concept including an arousing sound and a warning sound.
Further, the means for reproducing the music may be provided in the audio output device or in an external device other than the audio output device. In the latter case, the audio output device may acquire a playlist of songs from an external device in advance, and perform audio adjustment based on the playlist. Further, the sound adjustment may be performed in real time while acquiring the sound signal of the music being reproduced.
 上記に記載の音声出力装置において、音声情報は、その内容に応じて重要度が設定されており、音声情報調整手段は、重要度が高い音声情報については、曲の音および/または声の要素に対して適合度が低くなるように音声情報の音および/または声の要素を調整し、重要度が低い音声情報については、曲の音および/または声の要素に対して適合度が高くなるように音声情報の音および/または声の要素を調整することが好ましい。 In the audio output device described above, the importance of the audio information is set in accordance with the content thereof, and the audio information adjustment unit is configured to use the sound of the song and / or the voice element for the audio information having a high importance. The sound and / or voice elements of the audio information are adjusted so that the degree of adaptation is low with respect to the voice information. It is preferable to adjust the sound and / or the voice element of the voice information.
 この構成によれば、音声情報の重要度に応じて、音および/または声の要素を調整する(曲に対する適合度を高く/低くする)ことができる。これにより、重要度の高い音声情報は、聴衆者(ドライバー)に対して明確に伝えることができ、重要度の低い音声情報は、音楽鑑賞の妨げとなる可能性を低くすることができるなど、ドライバーと同乗者の双方にとって好ましい音声調整を行うことができる。 According to this configuration, it is possible to adjust the sound and / or voice elements according to the importance of the audio information (to increase / decrease the degree of adaptation to the music). As a result, audio information with high importance can be clearly communicated to the audience (driver), and audio information with low importance can be made less likely to interfere with music appreciation. It is possible to perform sound adjustment that is favorable for both the driver and the passenger.
 上記に記載の音声出力装置において、曲の音および/または声の要素に関する情報である曲メタデータと、音声情報の音および/または声の要素に関する情報である音声情報メタデータと、を記憶するメタデータ記憶手段をさらに備え、音声情報調整手段は、曲メタデータおよび音声情報メタデータを参照して、音声情報の音および/または声の要素を調整することが好ましい。 In the audio output device described above, music metadata that is information related to the sound and / or voice elements of music and audio information metadata that is information related to the sound and / or voice elements of the audio information are stored. It is preferable to further include metadata storage means, and the sound information adjustment means adjusts the sound and / or voice elements of the sound information with reference to the song metadata and the sound information metadata.
 この構成によれば、曲と音声情報の、音および/または声の要素に関する情報を、それぞれメタデータとして記憶しておくことで、容易に音声調整を行うことができる。 According to this configuration, it is possible to easily perform sound adjustment by storing information on sound and / or voice elements of music and sound information as metadata.
 上記に記載の音声出力装置において、音および/または声の要素が異なる複数種類の音声情報を記憶する音声情報記憶手段をさらに備え、音声情報調整手段は、音声情報の挿入時における曲の音および/または声の要素に応じて、音声情報記憶手段に記憶されている複数種類の音声情報の中から、出力対象となる1の音声情報を選択することが好ましい。 The voice output device described above further includes voice information storage means for storing a plurality of types of voice information having different sound and / or voice elements, and the voice information adjustment means includes the sound of the tune when the voice information is inserted It is preferable to select one piece of audio information to be output from among a plurality of types of audio information stored in the audio information storage unit according to the voice element.
 この構成によれば、複数種類の音声情報の中から、出力対象となる1の音声情報を選択するだけの容易な処理で、音声調整を行うことができる。 According to this configuration, it is possible to perform sound adjustment by an easy process of simply selecting one piece of sound information to be output from a plurality of types of sound information.
 上記に記載の音声出力装置において、音声情報調整手段は、音声情報の挿入時における曲の音および/または声を利用して、当該音声情報の挿入時に、当該音声情報の音および/または声を生成することが好ましい。 In the audio output device described above, the audio information adjusting means uses the sound and / or voice of the song at the time of inserting the audio information, and uses the sound and / or voice of the audio information at the time of inserting the audio information. It is preferable to produce.
 この構成によれば、音声情報の挿入時に音声調整を行うため、複数種類の音声情報を記憶しておくための記憶容量を必要としない。また、再生中の曲の音および/または声を利用して、音声情報の音および/または声を生成するため、多彩な音声情報を出力することができる。 According to this configuration, since voice adjustment is performed when voice information is inserted, a storage capacity for storing a plurality of types of voice information is not required. Also, since the sound and / or voice of the sound information is generated using the sound and / or voice of the music being reproduced, a variety of sound information can be output.
 上記に記載の音声出力装置において、音声情報調整手段は、音声情報の挿入開始時における曲の音および/または声の要素に応じて、音声情報の音および/または声の要素を調整することが好ましい。 In the audio output device described above, the audio information adjusting means may adjust the sound and / or voice element of the audio information in accordance with the sound and / or voice element of the song at the start of insertion of the audio information. preferable.
 この構成によれば、音声情報が時間的な長さを有する場合、曲の途中で、音および/または声の要素が変化することが考えられるが、そのような場合でも音声情報の挿入開始時に合わせて音声調整を行うことができるため、予め音声情報の時間的な長さが規定されていない場合でも対応できる。 According to this configuration, when the voice information has a time length, it is conceivable that the sound and / or voice elements change in the middle of the song. Since the voice adjustment can be performed together, it is possible to cope with the case where the time length of the voice information is not defined in advance.
 上記に記載の音声出力装置において、音の要素として、曲調、和音、律動のうちいずれか1以上の要素を含み、声の要素として、ピッチ、声量、声質、発音のうちいずれか1以上の要素を含むことが好ましい。 In the audio output device described above, the sound element includes one or more elements of tune, chord, and rhythm, and the voice element includes any one or more elements of pitch, volume, voice quality, and pronunciation It is preferable to contain.
 この構成によれば、曲に含まれる曲調、和音、律動や、声のピッチ、声量、声質、発音に応じて、音声情報のそれらの要素を調整することができる。例えば、曲が静かな曲調のときに、静かな声質の案内音声を挿入することで、音楽鑑賞の妨げとなる可能性を低くすることができる。また、曲が静かな曲調のときに、大きな声量の案内音声を挿入することで、聴衆者に対して明確に音声情報を伝えることができる。 According to this configuration, those elements of the voice information can be adjusted according to the tone, chord, rhythm, voice pitch, voice volume, voice quality, and pronunciation included in the song. For example, when the music is in a quiet tone, the possibility of hindering the music appreciation can be reduced by inserting a guidance voice with a quiet voice quality. Also, when the music is quiet, the voice information can be clearly communicated to the audience by inserting a large volume of guidance voice.
 上記に記載の音声出力装置において、曲を再生する曲再生手段をさらに備え、音声出力手段は、音声情報に基づく音および/または声と共に、曲再生手段により再生された曲を出力することが好ましい。 In the audio output device described above, it is preferable that the audio output device further includes a music reproducing unit that reproduces the music, and the audio output unit outputs the music reproduced by the music reproducing unit together with the sound and / or voice based on the audio information. .
 この構成によれば、曲の再生と、音声情報の挿入とを、一つの装置で実現することができる。 According to this configuration, music reproduction and voice information insertion can be realized with a single device.
 本発明の他の音声出力装置は、曲の再生中に、案内音声および/または効果音である音声情報を挿入する音声情報挿入手段と、音声情報の挿入時に再生されている曲のジャンルに応じて、音声情報の音源および/または言語を調整する音声情報調整手段と、音声情報調整手段による調整後の音声情報に基づいて、音声を出力する音声出力手段と、を備えたことを特徴とする。 Another audio output device according to the present invention is adapted to insert audio information that is guide voice and / or sound effect during reproduction of a song, and according to the genre of the song that is being reproduced when the audio information is inserted. Voice information adjusting means for adjusting the sound source and / or language of the voice information, and voice output means for outputting voice based on the voice information adjusted by the voice information adjusting means. .
 この構成によれば、再生されている曲のジャンルに応じて、音声情報の音源および/または言語を調整するため、例えば曲のジャンルに対して、適合度が高くなるように調整することで、音声情報が音楽鑑賞の妨げとなる可能性を低くすることができる。また、適合度が低くなるように調整することで、音声情報が曲(音楽)に紛れてしまうことがなく、聴衆者に対して明確に音声情報を伝えることができる。
 なお、「曲のジャンル」とは、洋楽や邦楽などの種別、クラシックやジャズなどの種別、映画音楽やCM音楽などの種別を指すものである。また、「音源」とは、演奏される楽器など、音を発生する装置を指すものである。
According to this configuration, in order to adjust the sound source and / or language of the audio information according to the genre of the song being played, for example, by adjusting so that the degree of fitness is high with respect to the genre of the song, It is possible to reduce the possibility that the audio information hinders music appreciation. Further, by adjusting so that the fitness level is low, the audio information is not mixed with the music (music), and the audio information can be clearly communicated to the audience.
The “song genre” indicates a type such as Western music or Japanese music, a type such as classic or jazz, a type such as movie music or CM music. The “sound source” refers to a device that generates sound, such as a musical instrument to be played.
 本発明のプログラムは、コンピュータを、上記に記載の音声出力装置における各手段として機能させるためのものであることを特徴とする。 The program of the present invention is a program for causing a computer to function as each means in the above-described audio output device.
 このプログラムを用いることにより、できるだけ音楽鑑賞を妨げないようにするなど、再生中の曲に応じた音声情報を挿入することができる音声出力装置を実現できる。 By using this program, it is possible to realize an audio output device that can insert audio information according to the music being played, for example, so as not to disturb the music appreciation as much as possible.
本発明の一実施形態に係る音声出力装置の制御構成を示すブロック図である。It is a block diagram which shows the control structure of the audio | voice output apparatus which concerns on one Embodiment of this invention. 音声案内リストの一例を示す図である。It is a figure which shows an example of an audio guidance list. プレイリストの一例を示す図である。It is a figure which shows an example of a playlist. 案内音声メタデータの一例を示す図である。It is a figure which shows an example of guidance audio | voice metadata. 喚起音メタデータの一例を示す図である。It is a figure which shows an example of arousal sound metadata. 曲メタデータの一例を示す図である。It is a figure which shows an example of music metadata. 音声出力装置の音声出力処理を示すフローチャートである。It is a flowchart which shows the audio | voice output process of an audio | voice output apparatus.
符号の説明Explanation of symbols
 1…音声出力装置 10…カーナビ部 11…音声情報挿入部 15…音声案内リスト 20…プレーヤ部 21…曲再生部 22…音声出力部 25…プレイリスト 30…音声情報調整部 31…案内音声調整部 32…喚起音調整部 41…コンテンツメタデータDB 42…コンテンツDB DESCRIPTION OF SYMBOLS 1 ... Voice output device 10 ... Car navigation part 11 ... Voice information insertion part 15 ... Voice guidance list 20 ... Player part 21 ... Song reproduction part 22 ... Voice output part 25 ... Playlist 30 ... Voice information adjustment part 31 ... Guide voice adjustment part 32 ... Arousing sound adjustment unit 41 ... Content metadata DB 42 ... Content DB
 以下、本発明の一実施形態に係る音声出力装置およびプログラムについて、添付図面を参照しながら詳細に説明する。本実施形態では、カーナビゲーション機能とオーディオプレーヤ機能とを有し、曲の再生中に、音声情報を挿入する車載型の音声出力装置を例示する。 Hereinafter, an audio output device and a program according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings. In the present embodiment, an in-vehicle audio output device that has a car navigation function and an audio player function and inserts audio information during the reproduction of a song is exemplified.
 図1は、音声出力装置1の制御構成を示すブロック図である。同図に示すように、音声出力装置1は、カーナビゲーション機能を司るカーナビ部10と、オーディオプレーヤ機能を司るプレーヤ部20と、カーナビゲーションを行うための音声情報を調整する音声情報調整部30と、音声情報や曲に関するメタデータを格納したコンテンツメタデータデータベース(以下、「コンテンツメタデータDB」と記載する)41と、音声情報や曲のコンテンツを格納したコンテンツデータベース(以下、「コンテンツDB」と記載する)42と、を備えている。 FIG. 1 is a block diagram showing a control configuration of the audio output device 1. As shown in the figure, the audio output device 1 includes a car navigation unit 10 that controls a car navigation function, a player unit 20 that controls an audio player function, and an audio information adjustment unit 30 that adjusts audio information for performing car navigation. , A content metadata database (hereinafter referred to as “content metadata DB”) 41 storing metadata related to audio information and music, and a content database (hereinafter referred to as “content DB”) storing audio information and music content. 42).
 カーナビ部10は、一般的なカーナビゲーション装置と同様に、ユーザ(ドライバー)が設定した経路や目的地と、GPS(Global Positioning System)受信機から受信したGPS情報とに基づいて、経路案内(道案内)を行う。また、道路交通情報を取得し、渋滞情報や交通規制に関する交通案内を行う。したがって、特に図示しないが、カーナビ部10には、上記のGPS受信機、経路誘導を行うための制御プログラム、経路表示を行うためのディスプレイも含まれる。 Similar to a general car navigation device, the car navigation unit 10 is configured to provide route guidance (road information) based on a route or destination set by a user (driver) and GPS information received from a GPS (Global Positioning System) receiver. Guidance). It also obtains road traffic information and provides traffic information regarding traffic jam information and traffic regulations. Therefore, although not particularly illustrated, the car navigation unit 10 includes the GPS receiver, a control program for performing route guidance, and a display for performing route display.
 さらに、カーナビ部10は、音声情報挿入部11を有している。音声情報挿入部11は、音声情報(経路案内や交通案内を音声にて行うための案内音声と、その案内音声の出力前にドライバーの注意を惹きつけるために出力される喚起音と、から成る)を、プレーヤ部20で再生されている曲に挿入すべく、音声情報調整部30に出力するものである。音声情報挿入部11は、カーナビ部10において予め作成された音声案内リスト15(図2参照)にしたがって音声情報を挿入する。なお、音声案内リスト15は、刻々と変化する状況(音声出力装置1が搭載された車両の進行速度や道路状況など)に応じてリアルタイムに更新される。 Furthermore, the car navigation unit 10 has a voice information insertion unit 11. The voice information insertion unit 11 includes voice information (guidance voice for performing route guidance and traffic guidance by voice, and arousing sound output for attracting the driver's attention before outputting the guidance voice. ) Is output to the audio information adjustment unit 30 so as to be inserted into the music being reproduced by the player unit 20. The voice information insertion unit 11 inserts voice information according to the voice guidance list 15 (see FIG. 2) created in advance in the car navigation unit 10. Note that the voice guidance list 15 is updated in real time according to a situation that changes every moment (such as a traveling speed of a vehicle on which the voice output device 1 is mounted and a road situation).
 プレーヤ部20は、ユーザが選択したプレイリスト25(図3参照)にしたがって曲再生を行う曲再生部21と、曲再生部21により再生された曲、並びに音声情報挿入部11により挿入された音声情報に基づく音声(音および声)を出力する音声出力部22と、を有している。なお、特に図示しないが、プレーヤ部20には、各種音声処理を行うためのオーディオコントロールデバイスやスピーカも含まれる。 The player unit 20 includes a song reproduction unit 21 that reproduces a song according to a playlist 25 (see FIG. 3) selected by the user, a song reproduced by the song reproduction unit 21, and a voice inserted by the audio information insertion unit 11. A voice output unit 22 that outputs voice (sound and voice) based on the information. Although not particularly illustrated, the player unit 20 includes an audio control device and a speaker for performing various audio processes.
 音声情報調整部30は、曲再生部21により再生されている曲の、音声情報挿入部11により音声情報が挿入された時点における音および声の要素に応じて、当該音声情報の音および声の要素を調整するものであり、案内音声調整部31と、喚起音調整部32と、を有している。本実施形態において、案内音声調整部31は、曲の音の要素である「曲調(メロディ)」に応じて、音声情報(案内音声)の声の要素である「声質(声色、声の調子)」を調整する。また、喚起音調整部32は、曲の音の要素である和音(ハーモニー)に応じて、音声情報(喚起音)の音の要素である和音を調整する。さらに、喚起音調整部32は、喚起音の重要度(本実施形態の場合、それに続く案内音声の重要度)も考慮して、喚起音を調整する。具体的な調整方法については、後述する。 The sound information adjusting unit 30 is configured to change the sound and voice of the sound information according to the sound and voice elements at the time when the sound information is inserted by the sound information inserting unit 11 of the song being reproduced by the song reproducing unit 21. It adjusts elements, and has a guidance voice adjustment unit 31 and a rousing sound adjustment unit 32. In the present embodiment, the guidance voice adjusting unit 31 “voice quality (voice color, tone of voice)” that is a voice element of voice information (guidance voice) according to “musical tone (melody)” that is an element of the sound of a song. ”. The arousal adjustment unit 32 adjusts a chord that is a sound element of sound information (arousal sound) in accordance with a chord that is a sound element of the music. Furthermore, the arousing sound adjustment unit 32 adjusts the arousing sound in consideration of the importance of the arousing sound (in the case of the present embodiment, the importance of the following guidance voice). A specific adjustment method will be described later.
 次に、図2ないし図6を参照し、音声案内リスト15、プレイリスト25および各種コンテンツメタデータの具体例について説明する。図2は、音声案内リスト15の一例を示す図である。音声案内リスト15は、案内音声毎に、「伝達時刻」と、「重要度」と、「グループID」と、が関連付けられている。図2では、「まもなく右方向です。」、「右です。」、「この先しばらく道なりです。」、「3時です。」の4つの案内音声が例示されている。また、各案内音声は、1以上の伝達情報から成る。例えば、案内音声「まもなく右方向です。」は、「まもなく」と、「右方向です。」の2つの伝達情報から成る。各伝達情報には、「音声ID」が関連付けられている。 Next, specific examples of the voice guidance list 15, the playlist 25, and various content metadata will be described with reference to FIGS. FIG. 2 is a diagram illustrating an example of the voice guidance list 15. In the voice guidance list 15, “transmission time”, “importance”, and “group ID” are associated with each guidance voice. In FIG. 2, four guidance voices of “soon to the right”, “to the right”, “this is a road for a while”, and “3 o'clock” are illustrated. Each guidance voice is composed of one or more pieces of transmission information. For example, the guidance voice “coming soon” is composed of two pieces of transmission information “coming soon” and “coming right”. Each transmission information is associated with a “voice ID”.
 項目「伝達時刻」は、その案内音声の伝達開始時刻を示している。上記のとおり、各案内音声の出力前には、喚起音を出力するため、「伝達開始時刻=喚起音の出力タイミング」となる。また、項目「重要度」は、案内音声の内容によって「重要度1」と「重要度0」の2段階に分類される。「重要度1」は、重要度の高い案内音声を指す。例えば、直近の運転に必要な情報(交差点手前500m以内に案内される進行方向の案内など)は、「重要度1」として設定される。これに対し、「重要度0」は、重要度の低い案内音声を指す。例えば、直近の運転に必要でない情報(交差点手前から500mを超える位置で案内される進行方向の案内、渋滞情報、左・右折の必要がない経路案内、時刻情報など)は、「重要度0」として設定される。なお、案内音声の重要度は、2段階ではなく3段階以上に設定することも可能である。 Item “Communication time” indicates the transmission start time of the guidance voice. As described above, an audible sound is output before each guidance voice is output, and therefore, “transmission start time = output timing of the audible sound”. The item “importance” is classified into two levels of “importance 1” and “importance 0” according to the content of the guidance voice. “Importance 1” indicates guidance voice with high importance. For example, information necessary for the latest driving (such as guidance in the traveling direction guided within 500 m before the intersection) is set as “importance 1”. On the other hand, “importance 0” indicates guidance voice with low importance. For example, information that is not necessary for the most recent driving (guidance in the direction of travel that is guided at a position more than 500m from the front of the intersection, traffic jam information, route guidance that does not require a left / right turn, time information, etc.) is “importance 0” Set as Note that the importance of the guidance voice can be set at three or more levels instead of two.
 項目「グループID」は、案内音声毎に設定されたものであり、同一のグループIDが付与された1以上の伝達情報は、連続して出力されることを意味する。これにより、音声案内リスト15の更新等によって、他のグループIDが付与された伝達情報の挿入を禁止することができる。例えば、「まもなく右方向です。」という案内音声の間に、「3時」などの他のグループIDの伝達情報が挿入されると、意味が分からなくなってしまうためである。 The item “group ID” is set for each guidance voice, and means that one or more pieces of transmission information assigned with the same group ID are output continuously. Thereby, insertion of the transmission information provided with other group IDs can be prohibited by updating the voice guidance list 15 or the like. For example, if the transmission information of another group ID such as “3 o'clock” is inserted in the guidance voice “coming soon,” the meaning will be lost.
 続いて、図3を参照し、プレイリスト25について説明する。プレイリスト25は、曲のコンテンツ毎に、「曲順」と、「曲ID」と、「長さ」と、を関連付けたものである。項目「曲順」は、曲を再生する順序を指す。また、項目「曲ID」は、各曲コンテンツを識別するためのコードであり、他のコンテンツと重複しないように、「M*****」で表される英数字となっている。項目「長さ」は、曲長を秒単位で示したものである。 Subsequently, the playlist 25 will be described with reference to FIG. The playlist 25 associates “song order”, “song ID”, and “length” for each piece of music content. The item “song order” indicates the order in which songs are played. The item “song ID” is a code for identifying each piece of music content, and is an alphanumeric character represented by “M ******” so as not to overlap with other content. The item “length” indicates the song length in seconds.
 続いて、図4を参照し、案内音声メタデータについて説明する。案内音声メタデータは、「音声ID」と、「伝達情報」と、「声質」とが関連付けられている。項目「音声ID」は、各案内音声コンテンツを識別するためのコードであり、他のコンテンツと重複しないように、「1****」で表される数字となっている。項目「伝達情報」は、案内音声の内容を示している。また、項目「声質」は、「普通」、「静か」、「明るい」の3つに分類され、それぞれ「音声ID」の下一桁に対応している。すなわち、「音声ID」の下一桁が「0」の案内音声コンテンツは、声質「普通」に対応し、「音声ID」の下一桁が「1」の案内音声コンテンツは、声質「静か」に対応し、「音声ID」の下一桁が「2」の案内音声コンテンツは、声質「明るい」に対応している。このように、案内音声メタデータは、同一内容の「伝達情報」に対し、3種類の案内音声コンテンツが用意されている。そして、音声出力装置1は、これら3種類の案内音声コンテンツの中から曲の曲調にマッチした(適合度、調和度、整合性の高い)声質の案内音声コンテンツを選択して出力する。 Subsequently, the guidance voice metadata will be described with reference to FIG. In the guidance voice metadata, “voice ID”, “transmission information”, and “voice quality” are associated. The item “voice ID” is a code for identifying each guidance voice content, and is a number represented by “1 ***” so as not to overlap with other content. The item “transmission information” indicates the content of the guidance voice. The item “voice quality” is classified into “normal”, “quiet”, and “bright”, each corresponding to the last digit of “voice ID”. That is, the guidance voice content whose last digit of “voice ID” is “0” corresponds to the voice quality “normal”, and the guidance voice content whose last digit of “voice ID” is “1” is the voice quality “quiet”. , And the guidance voice content whose last digit of “voice ID” is “2” corresponds to voice quality “bright”. As described above, the guidance voice metadata provides three types of guidance voice contents for the same “transmission information”. Then, the audio output device 1 selects and outputs a guidance voice content having a voice quality that matches the tone of the music (high compatibility, harmony, and consistency) from among these three types of guidance voice contents.
 なお、実際に曲に挿入される案内音声コンテンツは、この3種類のうちどれになるか挿入時まで未定であるため、カーナビ部10では、声質が「普通」の場合を想定して音声案内リスト15を作成している。したがって、図2に示した音声案内リスト15では、「声質ID」の下一桁が全て「0」となっている。 Note that since the guidance voice content actually inserted into the song is undecided until the time of insertion, which of these three types is determined, the car navigation unit 10 assumes that the voice quality is “normal” and the voice guidance list 15 is created. Therefore, in the voice guidance list 15 shown in FIG. 2, all the last digits of the “voice quality ID” are “0”.
 続いて、図5を参照し、喚起音メタデータについて説明する。喚起音メタデータは、各和音に対して、適合度0~適合度5の「喚起音ID」が関連付けられている。項目「喚起音ID」は、各喚起音コンテンツを識別するためのコードであり、他のコンテンツと重複しないように、「2****」で表される数字となっている。 Next, the sounding sound metadata will be described with reference to FIG. In the sounding sound metadata, “sounding sound ID” having a fitness level 0 to a fitness level 5 is associated with each chord. The item “sounding sound ID” is a code for identifying each sounding sound content, and is a number represented by “2 ***” so as not to overlap with other content.
 ここで、「適合度0」とは、関連付けられた和音に対して最も適合度が低いことを意味する。逆に、「適合度5」とは、関連付けられた和音に対して最も適合度が高いことを意味する。例えば同図の例では、和音Dと、喚起音ID「20917」とを同時に聞くと、明らかに違和感があり、和音Dと、喚起音ID「20049」とを同時に聞くと、とてもマッチしていて心地よく感じる。したがって、重要度の高い音声案内を行う場合は、そのときの曲の和音が「D」であれば、違和感のある喚起音ID「20917」を鳴らすことで、ドライバーの注意を強く惹きつけることができる。また、重要度の低い音声案内を行う場合は、そのときの曲の和音が「D」であれば、曲にマッチする喚起音ID「20049」を鳴らすことで、音楽鑑賞を妨げる可能性を低くすることができる。 Here, “goodness of fit 0” means that the goodness of fit is the lowest for the associated chords. On the other hand, “goodness of fit 5” means that the best fit for the associated chord. For example, in the example shown in the figure, when the chord D and the arousal ID “20917” are heard at the same time, there is clearly a sense of incongruity. I feel comfortable. Therefore, when voice guidance with a high degree of importance is performed, if the chord of the song at that time is “D”, it is possible to attract the driver's attention strongly by sounding an arousing sound ID “20917” with a sense of incongruity. it can. Also, when performing voice guidance with low importance, if the chord of the song at that time is “D”, the possibility of hindering the music appreciation is reduced by sounding the arousal ID “20049” that matches the song. can do.
 なお、本実施形態においては、その喚起音が用いられる音声案内の重要度に応じて「適合度0」と「適合度5」の喚起音コンテンツを使い分けるが、「適合度1」と「適合度4」や、「適合度2」と「適合度3」などの組み合わせで使い分けても良い。また、どの適合度を用いるかをユーザが設定可能としても良い。 In the present embodiment, the sound content of “fitness 0” and “fitness 5” is selectively used according to the importance of the voice guidance in which the sound is used, but “fitness 1” and “fitness” 4 ”or“ compatibility 2 ”and“ compatibility 3 ”may be used in combination. In addition, the user may be able to set which fitness level is used.
 続いて、図6を参照し、曲メタデータについて説明する。曲メタデータは、「曲調」と「和音」とが対応付けられた時系列データである。同図の例では、0.1秒間隔で「曲調」および「和音」が記録されている。そして、0.0(曲開始)~1.4秒までは「静か」だった曲調が、1.5秒経過後から「明るい」に変化していること、0.0~0.5秒までは和音「C」、0.6~1.3秒までは和音「Dm7」、1.4秒以降は和音「Gm」に変化すること、を示している。したがって、例えば、曲開始からの経過時間0.5秒以内に、音声情報が挿入開始された場合、その音声情報が「重要度の高い音声案内」であった場合は、曲の和音Dに対して適合度の低い喚起音を出力した後、曲調にマッチした案内音声を出力することとなる。 Next, the song metadata will be described with reference to FIG. The song metadata is time-series data in which “tune” and “chord” are associated with each other. In the example of the figure, “musical tone” and “chord” are recorded at intervals of 0.1 seconds. And the tone that was "quiet" from 0.0 (start of song) to 1.4 seconds has changed to "bright" after 1.5 seconds, from 0.0 to 0.5 seconds Indicates a chord “C”, a chord “Dm7” from 0.6 to 1.3 seconds, and a chord “Gm” after 1.4 seconds. Therefore, for example, when voice information is inserted within 0.5 seconds from the start of the song, and the voice information is “highly important voice guidance”, the chord D of the song is After outputting a low-sounding arousing sound, a guidance voice that matches the tune is output.
 次に、図7のフローチャートを参照し、音声出力装置1による音声出力処理の一連の流れについて説明する。曲が再生されている状況下において、まず音声情報挿入部11が、音声情報と、その重要度を示す情報と、を挿入すると(S01)、音声情報調整部30は、現在再生している曲の和音と曲調を判定する(S02)。当該判定は、プレーヤ部20から取得した曲IDと、その再生位置(曲開始からの経過時間)を示す情報とに基づき、コンテンツメタデータDB41内の曲メタデータを参照することにより行われる。なお、再生位置を示す情報は、定期的にプレーヤ部20から取得しても良いし、再生開始を示す情報のみを取得し、その後は経過時間をカウントして再生位置を特定するようにしても良い。 Next, with reference to the flowchart in FIG. 7, a series of flow of audio output processing by the audio output device 1 will be described. Under the situation where a song is being played, first, when the voice information insertion unit 11 inserts voice information and information indicating its importance (S01), the voice information adjustment unit 30 The chord and melody are determined (S02). This determination is made by referring to the song metadata in the content metadata DB 41 based on the song ID acquired from the player unit 20 and information indicating the playback position (elapsed time from the start of the song). Note that the information indicating the playback position may be periodically acquired from the player unit 20, or only information indicating the start of playback may be acquired, and thereafter, the playback position may be specified by counting elapsed time. good.
 続いて、音声情報調整部30は、S01で挿入された音声情報の重要度を判別する(S03)。ここで、音声情報調整部30が、重要度が高いと判定した場合は(S03:Yes)、コンテンツメタデータDB41内の喚起音メタデータ(図5参照)を参照し、音声情報の挿入開始時点における曲の和音に対して適合度の低い喚起音IDを選択する(S04)。一方、重要度が低いと判定した場合は(S03:No)、喚起音メタデータから、音声情報の挿入開始時点における曲の和音に対して適合度の高い喚起音IDを選択する(S05)。 Subsequently, the voice information adjustment unit 30 determines the importance of the voice information inserted in S01 (S03). Here, when the sound information adjustment unit 30 determines that the importance is high (S03: Yes), the sound information insertion start time is referred to the sounding sound metadata (see FIG. 5) in the content metadata DB 41. An arousing sound ID having a low fitness with respect to the chord of the song is selected (S04). On the other hand, if it is determined that the importance level is low (S03: No), a stimulating sound ID having a high matching degree with respect to the chord of the music at the start of the insertion of the sound information is selected from the sounding sound metadata (S05).
 続いて、音声情報調整部30は、コンテンツメタデータDB41内の案内音声メタデータ(図4参照)を参照し、音声情報の挿入開始時点における曲の曲調に応じた案内音声IDを選択する(S06)。案内音声についてはその重要度に関わらず、曲の曲調に適した案内音声IDを選択する。そして、プレーヤ部20は、音声情報調整部30から、喚起音IDおよび案内音声IDを取得し、コンテンツDB42から対応するコンテンツを読み出して、喚起音および案内音声を出力する(S07)。なお、プレーヤ部20は、喚起音および案内音声を出力する前後で、曲の音量を徐々に下げたり上げたりしても良いし、喚起音および案内音声が出力されている間、曲の再生を中断しても良い。 Subsequently, the audio information adjustment unit 30 refers to the guide audio metadata (see FIG. 4) in the content metadata DB 41, and selects a guide audio ID corresponding to the tune of the song at the start of audio information insertion (S06). ). For the guidance voice, a guidance voice ID suitable for the tone of the song is selected regardless of its importance. Then, the player unit 20 acquires the arousing sound ID and the guidance audio ID from the audio information adjustment unit 30, reads the corresponding content from the content DB 42, and outputs the arousing sound and the guidance audio (S07). Note that the player unit 20 may gradually decrease or increase the volume of the song before and after outputting the sounding sound and the guidance sound, or play the music while the sounding sound and the guidance sound are being output. It may be interrupted.
 ここで、上記の処理に倣い、具体例を挙げて説明する。例えば、図6に示すように、曲ID「M23452」の曲コンテンツが再生され、その再生位置が「0.4」秒の時点で、音声案内の伝達開始時刻「14:56:45」となり、「重要度1」の案内音声「まもなく右方向です。」が挿入される場合(図2参照)、そのときの曲の和音は「和音D」であるため、「和音D」に対して適合度が低い(「適合度0」の)喚起音ID「20917」が出力され(図5参照)、それに続き曲の曲調「静か」に適した案内音声(音声ID「15001」および音声ID「14001」)が出力される(図4参照)。 Here, following the above process, a specific example will be described. For example, as shown in FIG. 6, when the song content with the song ID “M23452” is played and the playback position is “0.4” seconds, the voice guidance transmission start time becomes “14:56:45”, When the guidance voice “Immediately right” is inserted (see FIG. 2), since the chord of the song at that time is “chord D”, the degree of fitness with respect to “chord D” Is generated (see FIG. 5), and the guidance voice (voice ID “15001” and voice ID “14001” suitable for the tone of the song “quiet” is output. ) Is output (see FIG. 4).
 なお、上記のフローチャートは、曲が再生中であることを前提としているが、オーディオプレーヤが停止している状態や、曲間など、曲が再生されていない状態で音声情報が挿入された場合は、音声情報の調整は行われない。すなわち、喚起音は、予め定められた所定の喚起音IDが選択され、音声IDは、声質「普通」に対応したものが選択される。また、喚起音については、音声情報の重要度に応じて、2種類の喚起音IDからいずれかを選択するようにしても良い。 Note that the above flowchart assumes that the song is being played back, but if audio information is inserted when the audio player is stopped or between songs, the song is not being played. The audio information is not adjusted. That is, a predetermined predetermined sound ID is selected as the sound, and the sound ID corresponding to the voice quality “normal” is selected. Further, as for the arousing sound, one of the two types of arousing sound IDs may be selected according to the importance of the voice information.
 以上説明したとおり、本実施形態の音声出力装置1によれば、再生されている曲の、音声情報の挿入時における音の要素に応じて、喚起音の和音や案内音声の声質を調整するため、例えば適合度が高くなるように調整することで、音声情報が曲に溶け込み、快適な音楽鑑賞を妨げることがない。また、適合度が低くなるように調整することで、音声情報が曲(音楽)に紛れてしまうことがなく、ドライバーに対して明確に音声情報を伝えることができる。また、音声情報の重要度に応じて、適合度を判別するため、重要度の高い音声情報は、ドライバーに対して明確に伝えることができ、重要度の低い音声情報は、音楽鑑賞の妨げとなる可能性を低くすることができるなど、ドライバーと同乗者の双方にとって好ましい音声調整を行うことができる。 As described above, according to the audio output device 1 of the present embodiment, in order to adjust the chord of the evoked sound and the voice quality of the guidance voice according to the sound element at the time of inserting the audio information of the tune being played back For example, by adjusting so that the fitness level is high, the audio information can be blended into the song, and comfortable music appreciation is not hindered. Further, by adjusting so that the fitness level is low, the audio information is not mixed with the music (music), and the audio information can be clearly transmitted to the driver. In addition, since the fitness is determined according to the importance of the audio information, the audio information with high importance can be clearly communicated to the driver, and the audio information with low importance is regarded as an obstacle to listening to music. The voice adjustment preferable for both the driver and the passenger can be performed.
 なお、上記の実施形態では、音声情報の挿入時に、音声情報の調整を行うものとしたが、予め音声情報を調整しておいても良い。この場合、予め生成された音声案内リスト15と、予め選択されたプレイリスト25と、に基づいて、音声情報の調整を行っておき、その調整結果に基づいて、曲の再生前に音声案内リスト15を作成しておく。なお、この場合当該音声案内リスト15には、音声ID(3種類の「声質」のうちいずれかが選択されたもの)と、喚起音IDとがリストアップされていることが好ましい。この構成によれば、音声案内リスト15に基づいて音声出力を行うだけでよい(音声調整の必要がない)ため、曲再生中における音声出力装置1の制御負荷を軽減できる。 In the above embodiment, the audio information is adjusted when the audio information is inserted. However, the audio information may be adjusted in advance. In this case, the voice information is adjusted based on the voice guidance list 15 generated in advance and the playlist 25 selected in advance, and the voice guidance list is reproduced before the music is reproduced based on the adjustment result. 15 is created. In this case, the voice guidance list 15 is preferably listed with a voice ID (one of the three types of “voice quality” selected) and a sounding sound ID. According to this configuration, it is only necessary to perform voice output based on the voice guidance list 15 (no need for voice adjustment), so the control load on the voice output device 1 during music reproduction can be reduced.
 また、上記の実施形態では、音声情報の一例として挙げた喚起音については、音声案内の重要度に応じて採用するコンテンツを変えるものとしたが、案内音声についてもその重要度に応じて採用するコンテンツを変化させても良い。但し、案内音声については、その重要度が高い場合に適合度の低いコンテンツを選択すると、曲の曲調が「明るい」場合に、「静か」な案内音声を再生するといった組み合わせの可能性もあり、その場合は案内音声が曲に掻き消されてしまうため、単に適合度が低いコンテンツを選択すれば良いという訳ではない。このため、案内音声メタデータとして、曲の曲調の種類に対し、重要度に応じて最適な案内音声の声質を規定した一覧表を用意しておくことが好ましい。 Further, in the above-described embodiment, the content to be adopted is changed according to the importance of the voice guidance for the arousing sound mentioned as an example of the voice information, but the guidance voice is also adopted according to the importance. The content may be changed. However, for guidance voices, if the importance level is high, selecting a content with a low fitness level may result in a combination of playing a "quiet" guidance voice when the song's tone is "bright". In that case, since the guidance voice is erased by the music, it is not necessary to simply select the content having a low fitness. For this reason, it is preferable to prepare a list that defines the voice quality of the optimum guidance voice according to the importance for the type of tune of the music as the guidance voice metadata.
 また、上記の実施形態では、全ての案内音声の前に喚起音を出力するとしたが、重要度の高い案内音声の前のみ喚起音を出力するようにしても良い。また、案内音声に喚起音を付加するか否か、重要度に応じて喚起音を付加する/付加しないを決定するか、等について、ユーザが設定可能としても良い。 In the above-described embodiment, the audible sound is output before all the guidance voices. However, the audible sound may be output only before the guidance voices having high importance. Further, the user may be able to set whether or not to add a rousing sound to the guidance voice, whether or not to add a rousing sound depending on the importance, and the like.
 また、上記の実施形態では、音声出力装置1内にコンテンツDB42を備えた構成であるものとしたが、これらを省略しても良い。この場合、音声出力装置1は、コンテンツDB42を格納した外部装置から適宜コンテンツを取得して、曲の再生や音声案内を行うこととなる。 In the above embodiment, the audio output device 1 includes the content DB 42. However, these may be omitted. In this case, the audio output device 1 appropriately acquires content from the external device that stores the content DB 42, and performs music reproduction and audio guidance.
 また、上記の実施形態では、音声出力装置1内にカーナビ部10と、プレーヤ部20とを備えた構成であるものとしたが、いずれか一方または両方を省略しても良い。例えば両方を省略する場合、外部装置であるカーナビゲーション装置から音声情報を取得し、当該音声情報を、外部装置であるオーディオプレーヤで再生されている曲に挿入すべく、調整し、調整後の音声をオーディオプレーヤに出力することとなる。 In the above embodiment, the audio output device 1 includes the car navigation unit 10 and the player unit 20, but either one or both may be omitted. For example, when both are omitted, audio information is acquired from a car navigation device that is an external device, and the audio information is adjusted to be inserted into a song that is being played back by an audio player that is an external device. Is output to the audio player.
 また、上記の実施形態では、「曲」に音の要素(曲調、和音)が含まれ、「音声情報」に音の要素(喚起音の和音)と、声の要素(案内音声の声質)と、が含まれるとしたが、これに限らない。例えば、「曲」に声の要素が含まれ、これに応じて「音声情報」の音の要素を調整しても良い。つまり、「曲」の音の要素に応じて、「音声情報」の声の要素を調整したり、「曲」の声の要素に応じて、「音声情報」の音の要素を調整したりするなど、必ずしも両者の要素が一致する必要はない。また、音および声を含む「音声情報」の場合、本実施形態のように、音の後に声が追加されるパターンではなく、声と音の両者が同時に出力されるものであっても良い。 In the above embodiment, the “song” includes a sound element (musical tone, chord), and the “voice information” includes a sound element (sounding chord), a voice element (guidance of the guidance voice), and However, the present invention is not limited to this. For example, a “song” may include a voice element, and the sound element of “voice information” may be adjusted accordingly. In other words, adjust the voice element of “voice information” according to the sound element of “song”, or adjust the sound element of “voice information” according to the voice element of “song” For example, both elements do not necessarily match. Further, in the case of “voice information” including sound and voice, both voice and sound may be output simultaneously instead of a pattern in which voice is added after the sound as in the present embodiment.
 また、音声情報の一例として、「喚起音」を例に挙げたが、「警告音」など繰り返し鳴らされるようなイメージのある音であっても良い。また、電車の到着音のように、数小節のメロディを含むような音であっても良い。すなわち、音を含む「音声情報」としては、種々の効果音を適用可能である。 In addition, as an example of the voice information, “arousing sound” has been described as an example, but a sound with an image such as “warning sound” may be repeated. Moreover, it may be a sound including a melody of several measures, such as a train arrival sound. That is, various sound effects can be applied as “voice information” including sound.
 また、音の要素として「曲調」と「和音」を例示したが、「律動(リズム,周期性)」、「音源の方向」など、他の要素を加えても良い。また、声の要素として「声質」を例示したが、「ピッチ(声の高さ)」、「声量(声の大きさ、強さ、幅)」、「発音」、「声の響き具合」など、他の要素を加えても良い。すなわち、曲の「律動」やボーカルの「ピッチ」等に応じて音声情報の音声要素を調整したり、曲の音声要素に応じて音声情報の「律動」や「ピッチ」等を調整したりしても良い。 In addition, although “musical tone” and “chord” are illustrated as sound elements, other elements such as “rhythm (rhythm, periodicity)” and “sound source direction” may be added. “Voice quality” has been exemplified as a voice element, but “pitch (voice pitch)”, “voice volume (voice volume, strength, width)”, “pronunciation”, “voice reverberation”, etc. Other elements may be added. In other words, the voice element of the voice information is adjusted according to the “rhythm” of the song, the “pitch” of the vocal, or the “rhythm” or “pitch” of the voice information is adjusted according to the voice element of the song. May be.
 また、上記の実施形態では、複数種類の音声情報の中から1の音声情報を選択することによって音声情報の調整を行うものとしたが、音声情報の挿入時における曲の音および/または声を利用して、当該音声情報の挿入時に、当該音声情報の音および/または声を生成するようにしても良い。この構成によれば、複数種類の音声情報を記憶しておくための記憶容量を削減できると共に、再生中の曲の音および/または声を利用して、音声情報の音および/または声を生成するため、多彩な音声情報を出力することができる。なお、再生中の曲の音を利用する例としては、曲を構成している音を組み合わせて、適合度の高い喚起音を生成したり、曲を構成している音を半音ずらした音を組み合わせて、適合度の低い喚起音を生成したりする方法が挙げられる。 In the above embodiment, the audio information is adjusted by selecting one audio information from a plurality of types of audio information. However, the sound and / or voice of a song at the time of inserting the audio information is changed. Utilizing the sound information, the sound and / or voice of the sound information may be generated when the sound information is inserted. According to this configuration, the storage capacity for storing multiple types of audio information can be reduced, and the sound and / or voice of the audio information can be generated using the sound and / or voice of the song being played. Therefore, a variety of audio information can be output. In addition, as an example of using the sound of the song being played, combining the sounds that make up the song to generate a sound with high suitability, or the sound that makes up the song is shifted by a semitone A method of generating an arousing sound with a low degree of fitness by combining them.
 また、上記の実施形態では、音声情報の挿入開始時における曲の音および/または声の要素に応じて、音声情報の音および/または声の要素を調整するものとしたが、音声情報が時間的な長さを有し、音声情報の再生途中で、曲の音および/または声の要素が変化した場合は、それに合わせて音声情報の音および/または声の要素を調整するようにしても良い。さらに、音声情報の長さが予め分かっている場合は、音声情報の再生途中で、曲の音および/または声の要素が変化した場合、音声情報と同時に再生される長さが長い方の曲の音および/または声の要素に応じて音声情報を調整しても良いし、音声情報の挿入終了時における曲の音および/または声の要素に応じて音声情報を調整しても良い。 In the above embodiment, the sound and / or voice element of the sound information is adjusted according to the sound and / or voice element of the song at the start of the insertion of the sound information. If the sound and / or voice elements of a song change during the playback of audio information, the sound and / or voice elements of the audio information may be adjusted accordingly. good. In addition, if the length of the audio information is known in advance, if the sound and / or voice elements of the song change during the playback of the audio information, The sound information may be adjusted according to the sound and / or voice elements, or the sound information may be adjusted according to the music sound and / or voice elements at the end of the insertion of the sound information.
 なお、本発明の音声出力装置1の応用例として、再生されている曲のジャンルに応じて、音声情報の音源および/または言語を調整しても良い。この場合、例えば曲のジャンルに対して、適合度が高くなるように調整することで、音声情報が音楽鑑賞の妨げとなる可能性を低くすることができる。また、適合度が低くなるように調整することで、音声情報が曲(音楽)に紛れてしまうことがなく、聴衆者に対して明確に音声情報を伝えることができる。なお、「曲のジャンル」とは、洋楽や邦楽などの種別、クラシックやジャズなどの種別、映画音楽やCM音楽などの種別を指すものである。また、「音源」とは、演奏される楽器など、音を発生する装置を指すものである。適合度が高くなる具体例としては、曲が洋楽の場合、案内音声を英語音声にし、曲が邦楽の場合は、案内音声を日本語にする方法が挙げられる。また、喚起音については、曲が演歌の場合、喚起音を「琴」の音色とし、曲がロックの場合は「エレキギター」の音色にするなどの方法が挙げられる。 As an application example of the audio output device 1 of the present invention, the sound source and / or language of the audio information may be adjusted according to the genre of the music being reproduced. In this case, for example, by adjusting the genre of music so that the degree of adaptation is high, the possibility that the audio information hinders music appreciation can be reduced. Further, by adjusting so that the fitness level is low, the audio information is not mixed with the music (music), and the audio information can be clearly communicated to the audience. The “song genre” indicates a type such as Western music or Japanese music, a type such as classic or jazz, a type such as movie music or CM music. The “sound source” refers to a device that generates sound, such as a musical instrument to be played. As a specific example in which the degree of fitness is high, there is a method in which the guidance voice is changed to English when the music is Western music, and the guidance voice is changed to Japanese when the music is Japanese music. As for the arousal sound, there is a method in which the arousal sound is a “koto” tone when the song is an enka, and an “electric guitar” tone when the song is rock.
 また、上記の実施形態では、車載型の音声出力装置1を例示したが、有線放送などのように、曲(音楽)を再生し続ける放送局において、時報や交通情報などを挿入する場合にも、本発明を適用可能である。この場合、時報や交通情報の挿入開始時点における曲の曲調や和音等に応じて、時報や交通情報などの音声情報を調整可能である。その他、または曲が再生されている状況下で、音声案内を行う装置であれば、その種類を問わず、本発明を適用可能である。 In the above embodiment, the in-vehicle audio output device 1 has been exemplified. However, in a broadcasting station that continuously reproduces music (music), such as cable broadcasting, a time signal or traffic information may be inserted. The present invention can be applied. In this case, the audio information such as the time signal and traffic information can be adjusted according to the tune and chord of the song at the start of insertion of the time signal and traffic information. The present invention can be applied to any device as long as it is a device that provides voice guidance under the circumstances where a song is being reproduced.
 また、本発明の音声出力装置1を映像に適用してもよい。例えば、近年ワンセグ放送が注目されているが、それらの映像を画像解析し、その解析結果に応じて、適合度が高くなるように、または低くなるように、音声情報の音および/または声の要素を調整しても良い。この場合、画像(映像)の要素としては、明るさ、各色の占有率、解像度、コントラスト、ジャンル(アニメ、実写など)などが挙げられる。 Also, the audio output device 1 of the present invention may be applied to video. For example, although one-segment broadcasting has been attracting attention in recent years, the sound of audio information and / or the voice of the voice information is analyzed so as to increase or decrease the fitness according to the analysis result of those images. Elements may be adjusted. In this case, the elements of the image (video) include brightness, occupancy of each color, resolution, contrast, genre (animation, live action, etc.) and the like.
 また、上記の実施形態や応用例に示した音声出力装置における各部をプログラムとして提供することも可能である。また、そのプログラムを記録媒体(図示省略)に格納して提供することも可能である。すなわち、コンピュータを、音声出力装置の各部として機能させるためのプログラム、およびそれを記録した記録媒体も、本発明の権利範囲に含まれるものである。その他、本発明の要旨を逸脱しない範囲で、適宜変更が可能である。 Also, it is possible to provide each unit in the audio output device shown in the above embodiment and application examples as a program. Further, the program can be provided by being stored in a recording medium (not shown). That is, a program for causing a computer to function as each unit of the audio output device and a recording medium recording the program are also included in the scope of the right of the present invention. Other modifications can be made as appropriate without departing from the scope of the present invention.

Claims (11)

  1.  曲の再生中に、案内音声および/または効果音である音声情報を挿入する音声情報挿入手段と、
     再生されている前記曲の、前記音声情報の挿入時における音および/または声の要素に応じて、前記音声情報の音および/または声の要素を調整する音声情報調整手段と、
     前記音声情報調整手段による調整後の前記音声情報に基づいて、音声を出力する音声出力手段と、を備えたことを特徴とする音声出力装置。
    Voice information insertion means for inserting voice information which is a guidance voice and / or a sound effect during the reproduction of a song;
    Voice information adjusting means for adjusting the sound and / or voice element of the voice information according to the sound and / or voice element at the time of insertion of the voice information of the song being played;
    An audio output device comprising: audio output means for outputting audio based on the audio information adjusted by the audio information adjustment means.
  2.  前記音声情報調整手段は、前記曲の音および/または声の要素に対して、適合度が高くなるように、または適合度が低くなるように、前記音声情報の音および/または声の要素を調整することを特徴とする請求項1に記載の音声出力装置。 The sound information adjusting means may adjust the sound and / or voice elements of the sound information so that the fitness level is high or the fitness level is low with respect to the sound and / or voice elements of the song. The audio output device according to claim 1, wherein the audio output device is adjusted.
  3.  前記音声情報は、その内容に応じて重要度が設定されており、
     前記音声情報調整手段は、重要度が高い音声情報については、前記曲の音および/または声の要素に対して適合度が低くなるように前記音声情報の音および/または声の要素を調整し、重要度が低い音声情報については、前記曲の音および/または声の要素に対して適合度が高くなるように前記音声情報の音および/または声の要素を調整することを特徴とする請求項2に記載の音声出力装置。
    The audio information has an importance set according to its content,
    The voice information adjusting means adjusts the sound and / or voice elements of the voice information so that the degree of fitness of the voice information having high importance is low with respect to the sound and / or voice elements of the song. The sound information and / or voice element of the sound information is adjusted so that the degree of fitness of the sound information with low importance is high with respect to the sound and / or voice element of the song. Item 3. The audio output device according to Item 2.
  4.  前記曲の音および/または声の要素に関する情報である曲メタデータと、前記音声情報の音および/または声の要素に関する情報である音声情報メタデータと、を記憶するメタデータ記憶手段をさらに備え、
     前記音声情報調整手段は、前記曲メタデータおよび前記音声情報メタデータを参照して、前記音声情報の音および/または声の要素を調整することを特徴とする請求項1に記載の音声出力装置。
    Metadata storage means for storing song metadata that is information relating to the sound and / or voice elements of the song and voice information metadata that is information relating to the sound and / or voice elements of the voice information is further provided. ,
    The audio output device according to claim 1, wherein the audio information adjustment unit adjusts a sound and / or a voice element of the audio information with reference to the music metadata and the audio information metadata. .
  5.  音および/または声の要素が異なる複数種類の前記音声情報を記憶する音声情報記憶手段をさらに備え、
     前記音声情報調整手段は、前記音声情報の挿入時における前記曲の音および/または声の要素に応じて、前記音声情報記憶手段に記憶されている複数種類の前記音声情報の中から、出力対象となる1の音声情報を選択することを特徴とする請求項1に記載の音声出力装置。
    Voice information storage means for storing a plurality of types of voice information having different sound and / or voice elements;
    The voice information adjusting means outputs an object to be output from among a plurality of types of the voice information stored in the voice information storage means according to the sound and / or voice element of the song when the voice information is inserted. The audio output device according to claim 1, wherein one audio information is selected.
  6.  前記音声情報調整手段は、前記音声情報の挿入時における前記曲の音および/または声を利用して、当該音声情報の挿入時に、当該音声情報の音および/または声を生成することを特徴とする請求項1に記載の音声出力装置。 The voice information adjusting means uses the sound and / or voice of the song at the time of inserting the voice information, and generates the sound and / or voice of the voice information at the time of inserting the voice information. The audio output device according to claim 1.
  7.  前記音声情報調整手段は、前記音声情報の挿入開始時における前記曲の音および/または声の要素に応じて、前記音声情報の音および/または声の要素を調整することを特徴とする請求項1に記載の音声出力装置。 The sound information adjusting means adjusts a sound and / or voice element of the sound information according to a sound and / or voice element of the song at the start of insertion of the sound information. 2. The audio output device according to 1.
  8.  前記音の要素として、曲調、和音、律動のうちいずれか1以上の要素を含み、前記声の要素として、ピッチ、声量、声質、発音のうちいずれか1以上の要素を含むことを特徴とする請求項1に記載の音声出力装置。 The sound element includes one or more elements of tune, chord, and rhythm, and the voice element includes any one or more elements of pitch, volume, voice quality, and pronunciation. The audio output device according to claim 1.
  9.  前記曲を再生する曲再生手段をさらに備え、
     前記音声出力手段は、前記音声情報に基づく音および/または声と共に、前記曲再生手段により再生された前記曲を出力することを特徴とする請求項1に記載の音声出力装置。
    It further comprises song playback means for playing back the song,
    The audio output device according to claim 1, wherein the audio output unit outputs the music reproduced by the music reproduction unit together with a sound and / or a voice based on the audio information.
  10.  曲の再生中に、案内音声および/または効果音である音声情報を挿入する音声情報挿入手段と、
     前記音声情報の挿入時に再生されている前記曲のジャンルに応じて、前記音声情報の音源および/または言語を調整する音声情報調整手段と、
     前記音声情報調整手段による調整後の前記音声情報に基づいて、音声を出力する音声出力手段と、を備えたことを特徴とする音声出力装置。
    Voice information insertion means for inserting voice information which is a guidance voice and / or a sound effect during the reproduction of a song;
    Audio information adjusting means for adjusting the sound source and / or language of the audio information according to the genre of the song being played when the audio information is inserted;
    An audio output device comprising: audio output means for outputting audio based on the audio information adjusted by the audio information adjustment means.
  11.  コンピュータを、請求項1ないし10のいずれか1項に記載の音声出力装置における各手段として機能させるためのプログラム。 A program for causing a computer to function as each means in the audio output device according to any one of claims 1 to 10.
PCT/JP2008/001216 2008-05-15 2008-05-15 Audio output device and program WO2009139022A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2010511789A JPWO2009139022A1 (en) 2008-05-15 2008-05-15 Audio output device and program
PCT/JP2008/001216 WO2009139022A1 (en) 2008-05-15 2008-05-15 Audio output device and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2008/001216 WO2009139022A1 (en) 2008-05-15 2008-05-15 Audio output device and program

Publications (1)

Publication Number Publication Date
WO2009139022A1 true WO2009139022A1 (en) 2009-11-19

Family

ID=41318406

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2008/001216 WO2009139022A1 (en) 2008-05-15 2008-05-15 Audio output device and program

Country Status (2)

Country Link
JP (1) JPWO2009139022A1 (en)
WO (1) WO2009139022A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010079091A (en) * 2008-09-26 2010-04-08 Toshiba Corp Sound output device, method and program for outputting sound
JPWO2014049719A1 (en) * 2012-09-26 2016-08-22 三菱電機株式会社 Audio output device
JP2019117324A (en) * 2017-12-27 2019-07-18 トヨタ自動車株式会社 Device, method, and program for outputting voice
JP2019164228A (en) * 2018-03-19 2019-09-26 本田技研工業株式会社 Controller for sound output device, control method, program, and vehicle

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002116045A (en) * 2000-10-11 2002-04-19 Clarion Co Ltd Sound volume controller
JP2007086316A (en) * 2005-09-21 2007-04-05 Mitsubishi Electric Corp Speech synthesizer, speech synthesizing method, speech synthesizing program, and computer readable recording medium with speech synthesizing program stored therein
WO2007091475A1 (en) * 2006-02-08 2007-08-16 Nec Corporation Speech synthesizing device, speech synthesizing method, and program
JP2008096483A (en) * 2006-10-06 2008-04-24 Matsushita Electric Ind Co Ltd Sound output control device and sound output control method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4700904B2 (en) * 2003-12-08 2011-06-15 パイオニア株式会社 Information processing apparatus and travel information voice guidance method
JP2006258699A (en) * 2005-03-18 2006-09-28 Aisin Aw Co Ltd On-vehicle system
JP2007127599A (en) * 2005-11-07 2007-05-24 Matsushita Electric Ind Co Ltd Navigation system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002116045A (en) * 2000-10-11 2002-04-19 Clarion Co Ltd Sound volume controller
JP2007086316A (en) * 2005-09-21 2007-04-05 Mitsubishi Electric Corp Speech synthesizer, speech synthesizing method, speech synthesizing program, and computer readable recording medium with speech synthesizing program stored therein
WO2007091475A1 (en) * 2006-02-08 2007-08-16 Nec Corporation Speech synthesizing device, speech synthesizing method, and program
JP2008096483A (en) * 2006-10-06 2008-04-24 Matsushita Electric Ind Co Ltd Sound output control device and sound output control method

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010079091A (en) * 2008-09-26 2010-04-08 Toshiba Corp Sound output device, method and program for outputting sound
JPWO2014049719A1 (en) * 2012-09-26 2016-08-22 三菱電機株式会社 Audio output device
JP2019117324A (en) * 2017-12-27 2019-07-18 トヨタ自動車株式会社 Device, method, and program for outputting voice
JP2019164228A (en) * 2018-03-19 2019-09-26 本田技研工業株式会社 Controller for sound output device, control method, program, and vehicle
JP7068875B2 (en) 2018-03-19 2022-05-17 本田技研工業株式会社 Sound output device control device, control method, program and vehicle

Also Published As

Publication number Publication date
JPWO2009139022A1 (en) 2011-09-08

Similar Documents

Publication Publication Date Title
US20090038467A1 (en) Interactive music training and entertainment system
JP2002525688A (en) Automatic music generation apparatus and method
JP2008203338A (en) Musical sound generating apparatus and musical sound generation method
US7528316B2 (en) Musical sound generating vehicular apparatus, musical sound generating method and program
WO2009141853A1 (en) Music reproducing device, voice navigation device, voice output device, and program
WO2009139022A1 (en) Audio output device and program
US20200341723A1 (en) Playback sound provision device
JP5014073B2 (en) Melody display control device and karaoke device
JP4658133B2 (en) Music playback apparatus and music playback method
WO2014142288A1 (en) Song editing device and song editing system
JP2007334202A (en) Karaoke device
JP2005135519A (en) Music reproducing unit
JP4068069B2 (en) Karaoke device that automatically controls back chorus volume
JP2009043353A (en) Title giving device, title giving method, title giving program, and recording medium
JP2005037846A (en) Information setting device and method for music reproducing device
JP2008187549A (en) Support system for playing musical instrument
JP5109397B2 (en) Musical sound generating apparatus and musical sound generating method for vehicle
JP6858567B2 (en) Information output device and information output method
JP2023033753A (en) karaoke device
JP2007233078A (en) Evaluation device, control method, and program
JPH0764580A (en) Karaoke device
JP2021018323A (en) Information providing device, information providing method, and program
JP2023105547A (en) Speed meter based on sound
WO2018012587A1 (en) Musical instrument practice system, musical instrument practice method, content selection device, acoustic device, acoustic system and content selection method
JP2023077685A (en) Karaoke system and server device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08751734

Country of ref document: EP

Kind code of ref document: A1

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
WWE Wipo information: entry into national phase

Ref document number: 2010511789

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08751734

Country of ref document: EP

Kind code of ref document: A1