WO2013172179A1 - Dispositif de présentation d'informations vocales et procédé de présentation d'informations vocales - Google Patents

Dispositif de présentation d'informations vocales et procédé de présentation d'informations vocales Download PDF

Info

Publication number
WO2013172179A1
WO2013172179A1 PCT/JP2013/062326 JP2013062326W WO2013172179A1 WO 2013172179 A1 WO2013172179 A1 WO 2013172179A1 JP 2013062326 W JP2013062326 W JP 2013062326W WO 2013172179 A1 WO2013172179 A1 WO 2013172179A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
converted
information presentation
generation unit
speech
Prior art date
Application number
PCT/JP2013/062326
Other languages
English (en)
Japanese (ja)
Inventor
充伸 神沼
健太 南
早苗 平井
Original Assignee
日産自動車株式会社
学校法人同志社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日産自動車株式会社, 学校法人同志社 filed Critical 日産自動車株式会社
Publication of WO2013172179A1 publication Critical patent/WO2013172179A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants

Definitions

  • the present invention relates to a voice information presentation device and a voice information presentation method for presenting voice information that can be understood without meaning too much of the driver's attention even when mounted on a vehicle.
  • Patent Document 1 is disclosed as an example of such a voice guidance system, and this system discloses that voice guidance is performed in accordance with a user's past information provision history and preferences.
  • the present invention has been proposed in view of the above-described circumstances, and provides an audio information presentation device and an audio information presentation method that can present audio information that does not attract too much attention of the driver. With the goal.
  • the voice information presentation device generates a reference voice that represents the language information to be presented as a voice, and converts the generated reference voice to generate a converted voice having a lower clarity than the reference voice. Then, the generated converted voice is output.
  • FIG. 1 is a block diagram showing the configuration of the audio information presentation apparatus according to the first embodiment to which the present invention is applied.
  • FIG. 2 is a flowchart showing a processing procedure of voice information presentation processing by the voice information presentation device according to the first embodiment to which the present invention is applied.
  • FIG. 3 is a flowchart showing a processing procedure of pitch processing by the audio information presentation device according to the first embodiment to which the present invention is applied.
  • FIG. 4 is a diagram for explaining pitch frequency conversion by the audio information presentation device according to the first embodiment to which the present invention is applied.
  • FIG. 5 is a diagram for explaining pitch frequency conversion by the audio information presentation device according to the first embodiment to which the present invention is applied.
  • FIG. 1 is a block diagram showing the configuration of the audio information presentation apparatus according to the first embodiment to which the present invention is applied.
  • FIG. 2 is a flowchart showing a processing procedure of voice information presentation processing by the voice information presentation device according to the first embodiment to which the present invention is applied.
  • FIG. 3 is a
  • FIG. 6 is a diagram for explaining pitch frequency conversion by the audio information presentation device according to the first embodiment to which the present invention is applied.
  • FIG. 7 is a diagram for explaining pitch processing by the audio information presentation device according to the first embodiment to which the present invention is applied.
  • FIG. 8 is a diagram for explaining the result of pitch processing by the audio information presentation apparatus according to the first embodiment to which the present invention is applied.
  • FIG. 9 is a diagram for explaining the result of pitch processing by the audio information presentation device according to the first embodiment to which the present invention is applied.
  • FIG. 10 is a diagram for explaining the envelope processing by the audio information presentation device according to the first embodiment to which the present invention is applied.
  • FIG. 11 is a diagram for explaining the result of the envelope processing by the audio information presentation device according to the first embodiment to which the present invention is applied.
  • FIG. 12 is a diagram for explaining amplitude processing by the audio information presentation apparatus according to the first embodiment to which the present invention is applied.
  • FIG. 13 is a diagram for explaining the result of amplitude processing by the audio information presentation device according to the first embodiment to which the present invention is applied.
  • FIG. 14 is a diagram for explaining the amplitude processing by the audio information presentation device according to the first embodiment to which the present invention is applied.
  • FIG. 15 is a diagram for explaining the result of amplitude processing by the audio information presentation device according to the first embodiment to which the present invention is applied.
  • FIG. 16 is a diagram for explaining audio information presentation processing by the audio information presentation apparatus according to the first embodiment to which the present invention is applied.
  • FIG. 17 is a diagram for explaining voice information presentation processing by the voice information presentation apparatus according to the first embodiment to which the present invention is applied.
  • FIG. 18 is a diagram for explaining audio information presentation processing by the audio information presentation apparatus according to the first embodiment to which the present invention is applied.
  • FIG. 19 is a diagram for explaining the result of the voice information presentation processing by the voice information presentation device according to the first embodiment to which the present invention is applied.
  • FIG. 20 is a diagram for explaining the effect of the voice information presentation processing by the voice information presentation device according to the first embodiment to which the present invention is applied.
  • FIG. 21 is a diagram for explaining the effect of the voice information presentation processing by the voice information presentation device according to the first embodiment to which the present invention is applied.
  • FIG. 22 is a diagram for explaining audio information presentation processing by the audio information presentation apparatus according to the second embodiment to which the present invention is applied.
  • FIG. 1 is a block diagram showing the configuration of the audio information presentation apparatus according to this embodiment.
  • the audio information presentation device 1 includes a reference audio generation unit 2 that generates a reference audio that expresses language information to be presented as audio, and a reference audio by converting the reference audio.
  • the conversion voice generation unit 3 that generates the converted voice with lower clarity than the voice
  • the voice output unit 4 that outputs the reference voice or the converted voice, and the information and the reference voice necessary for performing the voice presentation processing are stored.
  • a storage unit 5 a storage unit 5.
  • the voice information presentation device 1 is mounted on a vehicle, for example, applied to a navigation device or the like, and converts voice guidance provided at the time of route guidance into converted voice and outputs it. .
  • the converted voice can convey the language information that the original voice guidance intends to present, so the driver can easily understand the meaning of the language information.
  • the audio information presentation device 1 executes a specific program using a general-purpose electronic circuit including a microcomputer, a microprocessor, and a CPU, and thereby serves as a reference audio generation unit 2, a converted audio generation unit 3, and an audio output unit 4. Operate. It can be realized as hardware including a dedicated electronic circuit.
  • the reference voice generation unit 2 generates, for example, voice guidance provided at the time of route guidance of the navigation device as reference voice.
  • the reference voice may be generated by being synthesized as necessary, or may be generated by being stored in advance in the storage unit 5 and acquired.
  • the converted voice generation unit 3 generates a converted voice having a lower clarity than the reference voice by executing a conversion process on the reference voice.
  • the intelligibility is one of the scales indicating the quality of an audio signal such as a telephone, and various indexes have been proposed for its evaluation method, for example, the intelligibility index AI and the audio transmission index STI.
  • the conversion processing for reducing the intelligibility by the converted speech generation unit 3 includes pitch processing, envelope processing, and amplitude processing.
  • the pitch process is a process of converting a pitch frequency, which is a frequency related to the vocal cord vibration of the reference voice, to a specific frequency specified in advance or changing the pitch frequency based on a specific function.
  • the envelope process is a process for alleviating a sharp change in amplitude in the spectrum envelope of the reference speech.
  • the amplitude process is a process for modulating the amplitude in the time waveform of the reference sound.
  • the audio output unit 4 includes a DA converter 21 that converts reference audio or converted audio into an analog signal, an amplifier 22 that amplifies the audio converted into the analog signal, and a speaker 23 that outputs the amplified audio.
  • the reference voice or converted voice is output to the outside.
  • the storage unit 5 stores reference voice corresponding to preset voice guidance and the like, and also stores information necessary for performing voice information presentation processing. Also, the converted voice may be stored in advance.
  • the reference sound generation unit 2 generates a reference sound.
  • the reference voice is voice information similar to normal voice guidance output from the navigation device, and expresses language information to be presented as voice.
  • the reference sound generation unit 2 may generate the reference sound by acquiring the reference sound from the storage unit 5, or may generate the reference sound by synthesizing it.
  • step S102 primary design processing is performed.
  • the primary design process is a process that lowers the intelligibility of speech using various sound effectors according to the designer's sense and subjectivity.
  • the converted voice generation unit 3 performs pitch processing on the generated reference voice to convert the pitch frequency in order to reduce the clarity.
  • the specific procedure of the pitch processing can be realized by three processes of separation of pitch and envelope, pitch conversion, and recombination of the converted pitch and envelope.
  • the processing can be performed by using general pitch correction software such as “Auto-Tune”.
  • the converted pitch frequency 33 can be obtained by setting the target 31 of the frequency to be converted and converting the original pitch frequency 32 indicated by “+” to the target 31.
  • a plurality of target frequencies may be set, and the first target 34 and the second target 35 may be set as shown in FIG.
  • the converted pitch frequency 37 is obtained by converting the two targets 34 and 35 to the target having a shorter distance from the original pitch frequency 36.
  • the target may be a specific function such as the sine function 38 shown in FIG. 6, and the pitch frequency may be changed.
  • the pitch processing is performed on the reference voice 45 shown in FIG. 8 to convert the pitch frequency to one frequency
  • the converted voice 44 is output.
  • the pitch frequency is converted to C4 (about 262 Hz), and it can be seen that the converted voice 44 is contracted in the time direction as the frequency becomes higher than the reference voice 45.
  • FIGS. examples of processing using pitch adjustment, spectral gate, equalizer, compressor, etc. are shown in FIGS.
  • the check items in steps S106 to S109 described later may not be satisfied. Therefore, it is necessary to confirm whether or not the converted speech shown in the waveforms of FIGS. 16 and 18 satisfies the check after performing the correction processing in steps S104 and S105 described later.
  • the amplitude may be converted by superimposing the waveform shown in FIG. 14B on the reference voice as shown in FIG. By this convolution operation, a waveform as shown in FIG. 15 is converted.
  • the clarity can be reduced by performing the amplitude processing.
  • the language information intended to be presented with the reference voice is expressed in the same manner even after the amplitude processing, so that the meaning can be easily understood.
  • step S103 it is determined in step S103 whether or not the voice converted by the primary design process has a lower clarity than the reference voice. If the intelligibility has not decreased, the process returns to step S102 to perform the primary design process again. If the intelligibility has decreased, the process proceeds to step S104, and the troublesomeness is reduced in steps S104 and 105. To perform the process. In addition, the intelligibility can be reduced by reducing the annoyance of the voice.
  • step S104 the converted speech generation unit 3 performs an envelope process in order to reduce annoyance.
  • the annoyance is removed from the voice by suppressing the narrow-band peak 51 shown in FIG.
  • a low-pass filter as shown in FIG. 10B may be applied to the spectrum envelope 52 in FIG.
  • the narrow-band peak 51 is suppressed and converted into a gentle spectral envelope 52 as shown in FIG.
  • the narrow band peak is suppressed by performing the envelope processing in this way, it is possible to remove the annoyance from the voice, thereby preventing the driver from getting too much attention.
  • the language information intended to be presented with the reference voice is expressed in the same way even if the envelope processing is performed, the meaning can be easily understood.
  • step S105 the converted speech generation unit 3 performs amplitude processing to remove the annoyance.
  • the amplitude is converted by performing processing for correcting amplitude distortion and amplitude modulation. For example, as shown in FIG. 12, when a reference wave includes a rectangular wave portion as an amplitude distortion, a low-pass filter is applied to convert it into a smooth waveform as shown in FIG. This processing is performed not only on a rectangular wave but also on a non-continuous waveform such as a triangular wave.
  • steep rise and fall may be alleviated.
  • the rising edge shown in A1 is steep and is subject to correction, but the rising edge of A2 and the falling edge of A3 are gentle, so it is not subject to correction.
  • the rise and fall are less than 0.01 seconds, the correction is made.
  • the rise time of the converted voice is relaxed as shown in FIG. 9, and the rise is converted to be 0.01 seconds or longer.
  • steps S104 and S105 it is possible to generate a sound in which the annoyance is removed from the converted sound.
  • FIG. 2 shows the case where both of the processes of steps S104 and 105 are performed, only one of the processes of steps S104 and 105 may be performed. Nevertheless, it is possible to remove the annoyance from the converted speech sufficiently. Then, when the converted voice is generated, the converted voice generation unit 3 checks whether or not troublesomeness has been removed by executing the following processing.
  • step S106 the converted speech generation unit 3 determines whether or not a narrow band peak exists. For example, it is determined whether or not there is a narrow band peak of about 100 to 300 Hz in the high frequency region in the frequency spectrum shown in FIG. 16 which is the converted speech previously created, and a narrow band peak as shown in FIG. If it exists, the converted voice becomes annoying voice. In order to eliminate this, the process returns to step S104 to perform the process for removing the troublesomeness. If it does not exist, the process proceeds to step S107. .
  • step S107 the converted voice generation unit 3 determines whether or not there is an energy peak in the mid-high frequency band above 0.5 to 0.8 kHz, and the peak exists in the mid-high frequency band or higher. If so, the process returns to step S104 to perform the process for removing the troublesomeness. If no peak exists in the high frequency region, the process proceeds to step S108. For example, as shown in FIG. 17C, when there is an energy peak in a region of 6 kHz or higher, the converted sound becomes annoying sound, so the energy is reduced using a low-pass filter. Therefore, the process returns to step S104.
  • step S108 the converted voice generation unit 3 determines whether or not there are steep rising and falling edges in the time waveform. For example, in the time waveform, it is determined whether or not there is a steep rise or fall that is less than 0.01 seconds as shown in A1 of FIG. If there is a steep rise or fall, the converted voice becomes annoying voice, and in order to eliminate it, it is necessary to return to step S104 and perform processing for removing the troublesomeness again. If there is no steep rise or fall, the process proceeds to step S109.
  • step S109 the converted speech generation unit 3 determines whether or not nonlinear distortion exists in the time waveform. For example, in the time waveform, when nonlinear distortion exists in the converted sound as shown in FIG. 18 which is a part of the time waveform of the converted sound previously generated, the process returns to step S104 to remove the troublesomeness. If it does not exist, the process proceeds to step S110. If non-linear distortion exists, the converted voice becomes annoying voice, and it is necessary to perform processing for removing the troublesomeness again to eliminate it.
  • FIG. 19A is a time waveform of a reference sound before conversion
  • FIG. 19B is a frequency spectrum of the reference sound
  • FIG. 19C is a time waveform of converted sound after conversion
  • FIG. 19D is a conversion. It is the frequency spectrum of speech.
  • the time waveform of the converted voice is converted into a waveform that reduces a steep change and does not attract the driver's attention.
  • FIG.19 (d) it turns out that the energy in a high frequency area
  • the voice output unit 4 next outputs the converted voice in step S110, and the voice information presentation processing by the voice information presentation device 1 according to the present embodiment ends.
  • the rate that it took more than 1 second from the lighting of the LED to the response is 16.42% for the notification sound, which is a lower value than the converted voice and the voice guidance exceeding 20%. It can be seen that the driver's attention has not been drawn.
  • the converted voice and the voice guidance are compared, the converted voice is 23.18%, whereas the voice guidance is 25.00%, and the converted voice is lower. That is, the converted voice is provided with symbolic sound functionality such as a notification sound, and is less likely to attract the driver's attention than the voice guidance. Therefore, the probability that the reaction takes 1 second or more is reduced. In other words, the voice converted by this method is difficult to draw attention.
  • the converted voice according to the present invention is a voice that draws less attention than normal voice guidance.
  • the result of the evaluation experiment 2 for the voice information presentation device 1 according to the present embodiment will be described with reference to FIG.
  • the meanings of the normal voice guidance, the converted voice according to the present invention, and the sine sound as the notification sound are explained in advance to the subject.
  • “right caution” and “left caution” are set for the voice guidance
  • the converted voice is a voice guidance with reduced clarity
  • the notification sound is five discrete melody sounds consisting of three to the right, three The continuously changing sound centered at is left.
  • the reaction time was measured when the test subject was driven by the driving simulator and voice guidance, converted voice, and notification sound were presented, and the results are shown in FIG.
  • the average reaction time is the earliest voice guidance, and the reaction is performed in 1.22 seconds after the presentation of the stimulus is started.
  • the response time of the converted voice is about 1.38 seconds, and the reaction is performed with a delay of 0.16 seconds although not as much as the voice guidance.
  • the response time of the notification sound is 1.81 seconds, which causes a delay of 0.59 seconds compared to the voice guidance and 0.43 seconds compared to the converted voice.
  • the converted voice is a symbolic sound, it shows a reaction time comparable to that of voice guidance, suggesting that it is advantageous in terms of information understanding.
  • the converted voice according to the present invention can convey the meaning of the voice information to the same extent as the voice guidance.
  • the reference voice that expresses the language information to be presented as the voice is generated, and the conversion is lower in clarity than the reference voice. Since the voice is generated and output, even if it is provided as the voice guidance of the vehicle, the driver's attention is not drawn too much and the information to be transmitted can be easily understood.
  • the clarity of the voice is reduced by converting the frequency related to the vocal cord vibration of the reference voice into a specific frequency specified in advance. It is possible to generate converted speech without any.
  • the intelligibility is lowered by changing the frequency related to the vocal cord vibration of the reference audio based on a specific function. Information that is to be transmitted can be easily understood while the driver's attention is not drawn too much.
  • the voice information presentation device 1 since the annoyance is removed from the voice by suppressing a sharp change in the amplitude in the spectrum envelope of the reference voice, the driver's attention is not excessively attracted. Converted speech can be generated.
  • the clarity is lowered by modulating the amplitude in the time waveform of the reference speech, so that the amplitude distortion can be eliminated, thereby ensuring the clarity. Can be reduced.
  • the converted audio generation processing by the converted audio generation unit 3 is different from that of the first embodiment.
  • the converted voice generation unit 3 of the present embodiment generates a signal obtained by moving the reference voice in the time direction, and adds this signal to the reference voice to reduce the clarity. Yes.
  • a signal 72 is generated by moving the reference sound in the time direction with respect to the reference sound 71. Then, the converted sound is generated by adding these signals. Thereby, an echo effect is applied to the generated converted speech, and the intelligibility can be reduced.
  • FIG. 22 shows a case in which the reference sound is delayed in the time direction, a signal advanced in the time direction may be generated and added to the reference sound.
  • a signal whose energy is smaller than that of the reference sound or a signal that is slightly distorted than the reference sound may be generated and added to the reference sound.
  • the clarity is lowered by adding a signal obtained by moving the reference audio in the time direction to the reference audio.
  • the clarity can be surely lowered.
  • the reference voice that expresses the language information to be presented as the voice is generated, and the converted voice is lower in clarity than the reference voice. Is generated and output. As a result, even if it is provided as a voice guidance for the vehicle, the driver's attention is not drawn too much and information to be transmitted can be easily understood. Therefore, the audio information presentation device and the audio information presentation method according to one aspect of the present invention can be used industrially.

Abstract

La présente invention porte sur un dispositif de présentation d'informations vocales (1) qui est équipé : d'une unité de génération de voix de référence (2) pour générer une voix de référence exprimant les informations linguistiques à présenter sous la forme d'une voix ; d'une unité de génération de voix convertie (3) pour générer une voix convertie par conversion de la voix de référence et réduction de sa clarté par comparaison à la voix de référence ; et d'une unité de sortie vocale (4) pour délivrer la voix convertie.
PCT/JP2013/062326 2012-05-18 2013-04-26 Dispositif de présentation d'informations vocales et procédé de présentation d'informations vocales WO2013172179A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2012114280 2012-05-18
JP2012-114280 2012-05-18

Publications (1)

Publication Number Publication Date
WO2013172179A1 true WO2013172179A1 (fr) 2013-11-21

Family

ID=49583588

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2013/062326 WO2013172179A1 (fr) 2012-05-18 2013-04-26 Dispositif de présentation d'informations vocales et procédé de présentation d'informations vocales

Country Status (1)

Country Link
WO (1) WO2013172179A1 (fr)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002116045A (ja) * 2000-10-11 2002-04-19 Clarion Co Ltd 音量制御装置
WO2006008871A1 (fr) * 2004-07-21 2006-01-26 Matsushita Electric Industrial Co., Ltd. Synthétiseur vocal

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002116045A (ja) * 2000-10-11 2002-04-19 Clarion Co Ltd 音量制御装置
WO2006008871A1 (fr) * 2004-07-21 2006-01-26 Matsushita Electric Industrial Co., Ltd. Synthétiseur vocal

Similar Documents

Publication Publication Date Title
US9580010B2 (en) Vehicle approach notification apparatus
US20190389376A1 (en) Apparatus for providing environmental noise compensation for a synthesized vehicle sound
US20130038435A1 (en) Vehicle running warning device
EP1865494B1 (fr) Dispositif de traitement de bruit de moteur
JP4173891B2 (ja) 移動体用効果音発生装置
CN103253185B (zh) 车用主动式效果音发生装置
US9073477B2 (en) Vehicle approach notification device
EP3757986B1 (fr) Procédé de masquage de bruit adaptatif et système
WO2014112110A1 (fr) Synthétiseur de parole, dispositif de détection d'informations de filigrane électroniques, procédé de synthèse de parole, procédé de détection d'informations de filigrane électroniques, programme de synthèse vocale, et programme de détection d'informations de filigrane électroniques
US9050925B2 (en) Vehicle having an electric drive
JP2016134662A (ja) 警報装置
JP4983694B2 (ja) 音声再生装置
JP5454432B2 (ja) 車両接近通報装置
WO2013172179A1 (fr) Dispositif de présentation d'informations vocales et procédé de présentation d'informations vocales
JP7454119B2 (ja) 車両用音生成装置
JP5985306B2 (ja) 雑音低減装置および雑音低減方法
JP5704022B2 (ja) 車両接近通報装置
JP5533795B2 (ja) 車両接近通報装置
US11257477B2 (en) Motor noise masking
JP2007256838A (ja) 車両用効果音発生装置
JP4784343B2 (ja) 車両騒音音質制御装置
JP5699920B2 (ja) 車両用音響装置
CN112037805A (zh) 混音装置和混音方法
JP2008227681A (ja) 音響特性補正システム
JP2010156725A (ja) ノイズ除去方法及び回路

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13791579

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13791579

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP