WO2023073949A1 - Voice output device, server device, voice output method, control method, program, and storage medium - Google Patents

Voice output device, server device, voice output method, control method, program, and storage medium Download PDF

Info

Publication number
WO2023073949A1
WO2023073949A1 PCT/JP2021/040103 JP2021040103W WO2023073949A1 WO 2023073949 A1 WO2023073949 A1 WO 2023073949A1 JP 2021040103 W JP2021040103 W JP 2021040103W WO 2023073949 A1 WO2023073949 A1 WO 2023073949A1
Authority
WO
WIPO (PCT)
Prior art keywords
character string
voice
external device
control
control unit
Prior art date
Application number
PCT/JP2021/040103
Other languages
French (fr)
Japanese (ja)
Inventor
匡弘 岩田
Original Assignee
パイオニア株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by パイオニア株式会社 filed Critical パイオニア株式会社
Priority to PCT/JP2021/040103 priority Critical patent/WO2023073949A1/en
Priority to JP2023556054A priority patent/JPWO2023073949A1/ja
Publication of WO2023073949A1 publication Critical patent/WO2023073949A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/04Details of speech synthesis systems, e.g. synthesiser structure or memory management
    • G10L13/047Architecture of speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination

Definitions

  • the present invention relates to technology that can be used in voice guidance that accompanies communication.
  • Patent Literature 1 a configuration disclosed in Patent Literature 1 is conventionally known as a configuration for guiding a route to a destination of a vehicle by voice.
  • Patent Document 1 discloses an in-vehicle device that is mounted in a vehicle and has a voice guidance function, a server device that can communicate with the in-vehicle device via a communication network, A voice guidance system is disclosed.
  • Patent Document 1 does not particularly disclose the above-mentioned problems. Therefore, according to the configuration disclosed in Patent Document 1, there still exists a problem corresponding to the above-described problem.
  • the present invention has been made to solve the above problems, and the main object of the present invention is to provide a voice output device capable of suppressing an increase in the amount of communication according to the frequency of utterances in voice guidance involving communication. purpose.
  • the claimed invention is a voice output device, comprising: a character string generation unit for generating a character string for performing voice guidance; a communication unit that transmits the first character string to an external device and receives first voice data corresponding to the first character string from the external device; and the character string is the first character string.
  • the character string is a second character string that does not contain a proper noun while performing control for outputting a sound corresponding to the first sound data
  • the second character string a control unit that performs control for outputting a sound corresponding to the second sound data stored in the storage unit as the corresponding sound data.
  • the invention described in the claims is a server device, comprising a receiving unit for receiving a character string generated for performing voice guidance from an external device, and a first character string in which the character string includes a proper noun , the character string is a second character string that does not contain a proper noun while performing control for transmitting first voice data corresponding to the first character string to the external device.
  • a control unit that controls the external device to output a voice corresponding to the second voice data stored in the external device as the voice data corresponding to the second character string in the case of .
  • the claimed invention is a voice output method, wherein a character string for performing voice guidance is generated, and when the character string is a first character string including a proper noun, the first to an external device, receive first voice data corresponding to the first character string from the external device, When the character string is the first character string, control is performed to output a voice corresponding to the first voice data, while the second character string does not include a proper noun. , control is performed to output the voice corresponding to the second voice data stored in the storage unit as the voice data corresponding to the second character string.
  • the claimed invention is a control method in which a character string generated for performing voice guidance is received from an external device, and the character string is a first character string including a proper noun. and performing control for transmitting first voice data corresponding to the first character string to the external device, while the character string is a second character string that does not include a proper noun, Control is performed to cause the external device to output a voice corresponding to second voice data stored in the external device as voice data corresponding to the second character string.
  • the claimed invention is a program executed by a voice output device having a computer, comprising: a character string generation unit for generating a character string for performing voice guidance; a communication unit configured to transmit the first character string to an external device and receive first voice data corresponding to the first character string from the external device when the character string is 1; When the string is the first character string, control is performed to output a voice corresponding to the first audio data, while the character string is a second character string that does not contain a proper noun.
  • the computer functions as a control unit that performs control for outputting the sound corresponding to the second sound data stored in the storage unit as the sound data corresponding to the second character string.
  • the claimed invention is a program executed by a server device having a computer, comprising: a receiving unit for receiving a character string generated for performing voice guidance from an external device; If the first character string includes a proper noun, control is performed to transmit first audio data corresponding to the first character string to the external device, while the character string includes a proper noun. for causing the external device to output voice corresponding to second voice data stored in the external device as voice data corresponding to the second character string when the second character string does not include
  • the computer is made to function as a control unit that performs control.
  • FIG. 1 is a diagram showing a configuration example of an audio output system according to an embodiment
  • FIG. 1 is a block diagram showing a schematic configuration of an audio output device
  • 4 is a flowchart for explaining processing performed in the audio output system according to the first embodiment
  • 9 is a flowchart for explaining processing performed in the audio output system according to the second embodiment;
  • the voice output device includes a character string generator for generating a character string for providing voice guidance, and a communication unit for transmitting said first character string to an external device and receiving first voice data corresponding to said first character string from said external device; and said character string being said first character string.
  • the character string is a second character string that does not contain a proper noun while performing control for outputting a voice corresponding to the first voice data
  • the second character string and a control unit that performs control for outputting a sound corresponding to the second sound data stored in the storage unit as the sound data corresponding to the second sound data.
  • the above voice output device has a character string generation unit, a communication unit, and a control unit.
  • the character string generator generates a character string for voice guidance.
  • the communication unit when the character string is a first character string including a proper noun, transmits the first character string to an external device, and transmits first voice data corresponding to the first character string. Receive from the external device.
  • the control unit performs control for outputting a voice corresponding to the first voice data when the character string is the first character string, and performs control for outputting a voice corresponding to the first voice data when the character string is the first voice data.
  • control is performed to output the sound corresponding to the second sound data stored in the storage unit as the sound data corresponding to the second character string. This makes it possible to suppress an increase in the amount of communication according to the frequency of speech in voice guidance involving communication.
  • the control unit when the character string is the first character string, stores the first voice data received from the external device as cache data in the storage unit. Store.
  • the control unit when the cache data corresponding to the first character string is stored in the storage unit, the control unit does not communicate with the external device, Control is performed to output sound corresponding to the cache data.
  • the character string is a script including at least one sentence.
  • a server device in another embodiment, includes a receiving unit that receives a character string generated for performing voice guidance from an external device; and performing control for transmitting first voice data corresponding to the first character string to the external device, while the character string is a second character string that does not include a proper noun, and a control unit that controls the external device to output a voice corresponding to the second voice data stored in the external device as the voice data corresponding to the second character string.
  • control unit stores the first voice data as cache data in the storage unit when the character string is the first character string.
  • control unit causes the cache data to be transmitted to the external device when the cache data corresponding to the first character string is stored in the storage unit. control.
  • the character string is a script containing at least one sentence.
  • the voice output method generates a character string for performing voice guidance, and when the character string is a first character string including a proper noun, the first character transmitting a string to an external device; receiving first audio data corresponding to the first character string from the external device; and, if the character string is the first character string, the first speech While performing control for outputting a voice corresponding to data, when the character string is a second character string that does not contain a proper noun, storing it as voice data corresponding to the second character string in the storage unit Control is performed to output the sound corresponding to the stored second sound data.
  • This makes it possible to suppress an increase in the amount of communication according to the frequency of speech in voice guidance involving communication.
  • the control method receives from an external device a character string generated for providing voice guidance, and if the character string is a first character string including a proper noun, While performing control for transmitting first voice data corresponding to the first character string to the external device, when the character string is a second character string that does not include a proper noun, Control is performed to cause the external device to output the voice corresponding to the second voice data stored in the external device as the voice data corresponding to the second character string. This makes it possible to suppress an increase in the amount of communication according to the frequency of speech in voice guidance involving communication.
  • a program executed by a voice output device provided with a computer includes a character string generator for generating a character string for providing voice guidance, a first character string including a proper noun, a communication unit that transmits the first character string to an external device and receives first voice data corresponding to the first character string from the external device when the character string is a character string, and
  • the computer functions as a control unit that performs control for outputting the voice corresponding to the second voice data stored in the storage unit as the voice data corresponding to the second character string.
  • a program executed by a server device having a computer includes a receiving unit that receives a character string generated for providing voice guidance from an external device, and When the first character string contains a proper noun, the character string does not include a proper noun control for causing the external device to output a voice corresponding to the second voice data stored in the external device as the voice data corresponding to the second character string when the character string is a second character string;
  • the computer is made to function as a control unit for performing.
  • FIG. 1 is a diagram illustrating a configuration example of an audio output system according to an embodiment.
  • a voice output system 1 according to this embodiment includes a voice output device 100 and a server device 200 .
  • the audio output device 100 is mounted on the vehicle Ve.
  • the server device 200 communicates with a plurality of audio output devices 100 mounted on a plurality of vehicles Ve.
  • the voice output device 100 basically performs route search processing, route guidance processing, etc. for the user who is a passenger of the vehicle Ve. For example, when a destination or the like is input by the user, the voice output device 100 transmits an upload signal S1 including position information of the vehicle Ve and information on the designated destination to the server device 200 . Server device 200 calculates the route to the destination by referring to the map data, and transmits control signal S2 indicating the route to the destination to audio output device 100 . The voice output device 100 provides route guidance to the user by voice output based on the received control signal S2.
  • the voice output device 100 provides various types of information to the user through interaction with the user.
  • the audio output device 100 supplies the server device 200 with an upload signal S1 including information indicating the content or type of the information request and information about the running state of the vehicle Ve.
  • the server device 200 acquires and generates information requested by the user, and transmits it to the audio output device 100 as a control signal S2.
  • the audio output device 100 provides the received information to the user by audio output.
  • the voice output device 100 moves together with the vehicle Ve and performs route guidance mainly by voice so that the vehicle Ve travels along the guidance route.
  • route guidance based mainly on voice refers to route guidance in which the user can grasp information necessary for driving the vehicle Ve along the guidance route at least from only voice, and the voice output device 100 indicates the current position. It does not exclude the auxiliary display of a surrounding map or the like.
  • the voice output device 100 outputs at least various information related to driving, such as points on the route that require guidance (also referred to as “guidance points”), by voice.
  • the guidance point corresponds to, for example, an intersection at which the vehicle Ve turns right or left, or other passing points important for the vehicle Ve to travel along the guidance route.
  • the voice output device 100 provides voice guidance regarding guidance points such as, for example, the distance from the vehicle Ve to the next guidance point and the traveling direction at the guidance point.
  • the voice regarding the guidance for the guidance route is also referred to as "route voice guidance”.
  • the audio output device 100 is installed, for example, on the upper part of the windshield of the vehicle Ve or on the dashboard. Note that the audio output device 100 may be incorporated in the vehicle Ve.
  • FIG. 2 is a block diagram showing a schematic configuration of the audio output device 100.
  • the audio output device 100 mainly includes a communication unit 111, a storage unit 112, an input unit 113, a control unit 114, a sensor group 115, a display unit 116, a microphone 117, a speaker 118, and an exterior camera 119. and an in-vehicle camera 120 .
  • Each element in the audio output device 100 is interconnected via a bus line 110 .
  • the communication unit 111 performs data communication with the server device 200 under the control of the control unit 114 .
  • the communication unit 111 may receive, for example, map data for updating a map DB (DataBase) 4 to be described later from the server device 200 .
  • Map DB DataBase
  • the storage unit 112 is composed of various memories such as RAM (Random Access Memory), ROM (Read Only Memory), and non-volatile memory (including hard disk drive, flash memory, etc.).
  • the storage unit 112 stores a program for the audio output device 100 to execute predetermined processing.
  • the above programs may include an application program for providing route guidance by voice, an application program for playing back music, an application program for outputting content other than music (such as television), and the like.
  • Storage unit 112 is also used as a working memory for control unit 114 . Note that the program executed by the audio output device 100 may be stored in a storage medium other than the storage unit 112 .
  • the storage unit 112 also stores a map database (hereinafter, the database is referred to as "DB") 4. Various data required for route guidance are recorded in the map DB 4 .
  • the map DB 4 stores, for example, road data representing a road network by a combination of nodes and links, and facility data indicating facilities that are candidates for destinations, stop-off points, or landmarks.
  • the map DB 4 may be updated based on the map information received by the communication section 111 from the map management server under the control of the control section 114 .
  • the storage unit 112 stores voice data corresponding to a pre-generated script (character string) that does not contain proper nouns and contains at least one sentence.
  • the general-purpose voice data DB 112a of the storage unit 112 stores, for example, "Go along the road” and "Soon turn left at the traffic light. Please stay in the leftmost lane.” Voice data corresponding to the script is stored in advance.
  • voice data corresponding to a script (character string) generated to include a proper noun and at least one sentence can be stored in the voice cache data DB 112b of the storage unit 112 as cache data.
  • Proper nouns include, for example, place names, interchange names, intersection names, road names, and landmark names.
  • the input unit 113 is a button, touch panel, remote controller, etc. for user operation.
  • the display unit 116 is a display or the like that displays based on the control of the control unit 114 .
  • the microphone 117 collects sounds inside the vehicle Ve, particularly the driver's utterances.
  • a speaker 118 outputs audio for route guidance to the driver or the like.
  • the sensor group 115 includes an external sensor 121 and an internal sensor 122 .
  • the external sensor 121 is, for example, one or more sensors for recognizing the surrounding environment of the vehicle Ve, such as a lidar, radar, ultrasonic sensor, infrared sensor, and sonar.
  • the internal sensor 122 is a sensor that performs positioning of the vehicle Ve, and is, for example, a GNSS (Global Navigation Satellite System) receiver, a gyro sensor, an IMU (Inertial Measurement Unit), a vehicle speed sensor, or a combination thereof.
  • GNSS Global Navigation Satellite System
  • IMU Inertial Measurement Unit
  • vehicle speed sensor or a combination thereof.
  • the sensor group 115 may have a sensor that allows the control unit 114 to directly or indirectly derive the position of the vehicle Ve from the output of the sensor group 115 (that is, by performing estimation processing).
  • the vehicle exterior camera 119 is a camera that captures the exterior of the vehicle Ve.
  • the exterior camera 119 may be only a front camera that captures the front of the vehicle, or may include a rear camera that captures the rear of the vehicle in addition to the front camera. good too.
  • the in-vehicle camera 120 is a camera for photographing the interior of the vehicle Ve, and is provided at a position capable of photographing at least the vicinity of the driver's seat.
  • the control unit 114 includes a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), etc., and controls the audio output device 100 as a whole. For example, the control unit 114 estimates the position (including the traveling direction) of the vehicle Ve based on the outputs of one or more sensors in the sensor group 115 . Further, when a destination is specified by the input unit 113 or the microphone 117, the control unit 114 generates route information indicating a guidance route to the destination, Based on the positional information and the map DB 4, route guidance is provided. In this case, the control unit 114 causes the speaker 118 to output route voice guidance. Further, the control unit 114 controls the display unit 116 to display information about the music being played, video content, a map of the vicinity of the current position, or the like.
  • a CPU Central Processing Unit
  • GPU Graphics Processing Unit
  • control unit 114 is not limited to being implemented by program-based software, and may be implemented by any combination of hardware, firmware, and software. Also, the processing executed by the control unit 114 may be implemented using a user-programmable integrated circuit such as an FPGA (field-programmable gate array) or a microcomputer. In this case, this integrated circuit may be used to implement the program executed by the control unit 114 in this embodiment. Thus, the control unit 114 may be realized by hardware other than the processor.
  • FPGA field-programmable gate array
  • the configuration of the audio output device 100 shown in FIG. 2 is an example, and various changes may be made to the configuration shown in FIG.
  • the control unit 114 may receive information necessary for route guidance from the server device 200 via the communication unit 111 .
  • the audio output device 100 is electrically connected to an audio output unit configured separately from the audio output device 100, or by a known communication means, so as to output the audio. Audio may be output from the output unit.
  • the audio output unit may be a speaker provided in the vehicle Ve.
  • the audio output device 100 does not have to include the display section 116 .
  • the audio output device 100 does not need to perform display-related control at all. may be executed.
  • the audio output device 100 may acquire information output by sensors installed in the vehicle Ve based on a communication protocol such as CAN (Controller Area Network) from the vehicle Ve. .
  • CAN Controller Area Network
  • the server device 200 generates route information indicating a guidance route that the vehicle Ve should travel based on the upload signal S1 including the destination and the like received from the voice output device 100 .
  • the server device 200 then generates a control signal S2 relating to information output in response to the user's information request based on the user's information request indicated by the upload signal S1 transmitted by the audio output device 100 and the running state of the vehicle Ve.
  • the server device 200 then transmits the generated control signal S ⁇ b>2 to the audio output device 100 .
  • FIG. 3 is a diagram showing an example of a schematic configuration of the server device 200.
  • the server device 200 mainly has a communication section 211 , a storage section 212 and a control section 214 .
  • Each element in the server device 200 is interconnected via a bus line 210 .
  • the communication unit 211 performs data communication with an external device such as the audio output device 100 under the control of the control unit 214 .
  • the storage unit 212 is composed of various types of memory such as RAM, ROM, nonvolatile memory (including hard disk drive, flash memory, etc.). Storage unit 212 stores a program for server device 200 to execute a predetermined process. Moreover, the memory
  • the control unit 214 includes a CPU, GPU, etc., and controls the server device 200 as a whole. Further, the control unit 214 operates together with the audio output device 100 by executing a program stored in the storage unit 212, and executes route guidance processing, information provision processing, and the like for the user. For example, based on the upload signal S1 received from the audio output device 100 via the communication unit 211, the control unit 214 generates route information indicating a guidance route or a control signal S2 relating to information output in response to a user's information request. Then, the control unit 214 transmits the generated control signal S2 to the audio output device 100 through the communication unit 211 .
  • FIG. 4 is a flowchart for explaining processing performed in the audio output system according to the first embodiment.
  • control unit 114 of the voice output device 100 acquires driving situation information including information indicating the driving situation of the vehicle Ve, for example, at any timing during route guidance (step S11).
  • the driving situation information includes, for example, the direction of the vehicle Ve, the speed of the vehicle Ve, traffic information around the position of the vehicle Ve (including speed regulation and traffic congestion information, etc.), and voice information such as the current time. At least one piece of information that can be acquired based on the function of each unit of the output device 100 may be included. Further, the driving situation information may include any one of the voice obtained by the microphone 117, the image captured by the exterior camera 119, and the image captured by the interior camera 120. The driving status information may also include information received from the server device 200 through the communication unit 111 .
  • control unit 114 generates a script SC1 for providing voice guidance to passengers of the vehicle Ve based on the driving situation information acquired in step S11 (step S12).
  • control unit 114 acquires voice data SD1 corresponding to script SC1 from storage unit 112 (step S20), and Control is performed to output the sound from the speaker 118 (step S21).
  • step S12 when the script SC2 including proper nouns is generated in step S12, the control unit 114 confirms whether or not the voice data corresponding to the script SC2 is stored in the storage unit 112 as cache data (step S13).
  • control unit 114 When the control unit 114 detects that the voice data SD2 corresponding to the script SC2 is stored in the storage unit 112 as the cache data CD2, the control unit 114 obtains the cache data CD2 from the storage unit 112 (step S20). , control is performed to output sound corresponding to the cache data CD2 from the speaker 118 (step S21). That is, if the cache data CD2 corresponding to the character string containing the proper noun generated in step S12 is stored in the storage unit 112, the control unit 114 does not communicate with the server device 200, and Control for outputting audio corresponding to CD2 is performed.
  • control unit 114 detects that the voice data SD2 corresponding to the script SC2 is not stored in the storage unit 112 as cache data, the control unit 114 causes the communication unit 111 to transmit the script SC2 to the server device 200. control. According to such control, the communication unit 111 transmits the script SC2 to the server device 200 (step S14).
  • the communication unit 211 of the server device 200 receives the script SC2 transmitted from the audio output device 100 (step S15).
  • the control unit 214 of the server device 200 performs processing for generating the audio data SD2 corresponding to the script SC2 received by the communication unit 211 (step S16), and then outputs the audio data SD2 from the communication unit 211 to the audio output device. 100 is controlled. According to such control, the communication unit 211 transmits the audio data SD2 to the audio output device 100 (step S17).
  • the processing related to the generation of the audio data SD2 corresponding to the script SC2 is not limited to being performed in the control unit 214.
  • a speech synthesis function such as TTS (Text To Speech), etc. may be performed on an external server with
  • processing for requesting the external server to generate voice data SD2 corresponding to the script SC2 and processing for acquiring the voice data SD2 from the external server are performed in step S16. It should be done in
  • the communication unit 111 receives the audio data SD2 transmitted from the server device 200 (step S18).
  • the control unit 114 stores the audio data SD2 received by the communication unit 111 as the cache data CD2 in the storage unit 112 (step S19), and then controls the speaker 118 to output audio corresponding to the audio data SD2. (Step S21).
  • control unit 114 has a function as a character string generation unit.
  • the voice output device 100 can be used when voice data corresponding to a script (character string) that does not contain proper nouns is stored in the storage unit 112, and when voice data corresponding to scripts (character strings) that do not contain proper nouns
  • voice guidance can be provided without communicating with the server device 200 . Therefore, according to the present embodiment, it is possible to suppress an increase in the amount of communication according to the frequency of speech in voice guidance involving communication.
  • the voice corresponding to the script is output, it is easier to hear than the case of outputting the voice generated by inserting different proper nouns in the sentence depending on the situation. can be improved.
  • the control unit 114 may set the upper limit value CMV of the amount of cache data that can be stored in the storage unit 112 . Further, when the amount of cache data in the storage unit 112 exceeds the upper limit value CMV due to the storage of new cache data, the control unit 114 deletes from the storage unit 112 the cache data that has been voice-output most recently. can be In addition, the control unit 114 may associate the number of times of use with the cache data of the scripts including proper nouns, so that scripts with a large number of times of use are not deleted from the storage unit 112 .
  • Modification 2 For example, when the storage capacity of the storage unit 112 is limited, the voice output device 100 (control unit 114) selects a script with medium to high frequency of voice output among many scripts that do not contain proper nouns. While the corresponding voice data is acquired from the storage unit 112 , the voice data corresponding to the script whose voice output frequency is low may be acquired (received) from the server device 200 .
  • Modification 3 For example, for scripts that do not contain proper nouns, such as "To the destination x kilometers, the required time is y minutes.” may be stored in the storage unit 112 .
  • FIG. 5 is a flow chart for explaining the processing performed in the audio output system according to the second embodiment.
  • control unit 114 acquires driving situation information including information indicating the driving situation of the vehicle Ve, for example, at any timing during route guidance (step S31).
  • control unit 114 generates a script for providing voice guidance to the passengers of the vehicle Ve based on the driving situation information acquired in step S31 (step S32), and transmits the generated script to the communication unit 111. to the server device 200.
  • the communication unit 111 transmits the script generated in step S32 to the server device 200 (step S33).
  • the communication unit 211 receives the script transmitted from the audio output device 100 (step S34).
  • control unit 214 When the control unit 214 receives the script SC3 that does not include a proper noun in step S34, the control unit 214 transmits the control signal CS for outputting the voice corresponding to the script SC3 from the communication unit 211 to the voice output device 100. Perform control for transmission. According to such control, the communication unit 211 transmits the control signal CS to the audio output device 100 (step S39). That is, when the character string received by communication unit 211 is a character string that does not contain a proper noun, control unit 214 stores the character string in audio output device 100 (storage unit 112) as audio data corresponding to the character string. It controls the audio output device 100 to output the audio corresponding to the audio data stored therein.
  • step S34 when the script SC4 including a proper noun is received, the control unit 214 confirms whether or not the voice data corresponding to the script SC4 is stored in the storage unit 212 as cache data. (Step S35).
  • control unit 214 When the control unit 214 detects that the voice data SD4 corresponding to the script SC4 is stored in the storage unit 212 as the cache data CD4, the control unit 214 acquires the cache data CD4 from the storage unit 212 (step S38). , the control for transmitting the cache data CD4 from the communication unit 211 to the audio output device 100 is performed. According to such control, the communication unit 211 transmits the cache data CD4 to the audio output device 100 (step S39).
  • control unit 214 When the control unit 214 detects that the audio data SD4 corresponding to the script SC4 is not stored in the storage unit 212 as cache data, it performs processing for generating the audio data SD4 (step S36). After that, the audio data SD4 is stored in the storage unit 212 as cache data CD4 (step S37). Then, the control unit 214 performs control for transmitting the audio data SD4 from the communication unit 211 to the audio output device 100. FIG. According to such control, the communication unit 211 transmits the audio data SD4 to the audio output device 100 (step S39).
  • the communication unit 111 receives the audio data SD4, the cache data CD4, or the control signal CS transmitted from the server device 200 (step S40).
  • control unit 114 When the communication unit 111 receives the audio data SD4, the control unit 114 performs control to output the audio corresponding to the audio data SD4 from the speaker 118 (step S41).
  • control unit 114 performs control for outputting the sound corresponding to the cache data CD4 from the speaker 118 (step S41).
  • control unit 114 acquires the voice data SD3 corresponding to the script SC3 generated in step S32 from the storage unit 112, and then responds to the voice data SD3. Control is performed to output sound from the speaker 118 (step S41).
  • the communication unit 211 has a function as a receiving unit.
  • the server device 200 when the server device 200 according to the present embodiment receives a script (character string) that does not include a proper noun as a script (character string) for voice guidance from the voice output device 100, the voice data is transmitted, a control signal for outputting a sound corresponding to the sound data stored in the storage unit 112 is transmitted. Therefore, according to the present embodiment, it is possible to suppress an increase in the amount of communication according to the frequency of speech in voice guidance involving communication. Moreover, according to the present embodiment, by using cache data, it is possible to reduce the server load related to voice generation.
  • Non-transitory computer readable media include various types of tangible storage media.
  • Examples of non-transitory computer-readable media include magnetic storage media (e.g., floppy disks, magnetic tapes, hard disk drives), magneto-optical storage media (e.g., magneto-optical discs), CD-ROMs (Read Only Memory), CD-Rs, CD-R/W, semiconductor memory (eg mask ROM, PROM (Programmable ROM), EPROM (Erasable PROM), flash ROM, RAM (Random Access Memory)).
  • audio output device 200 server device 111, 211 communication unit 112, 212 storage unit 113 input unit 114, 214 control unit 115 sensor group 116 display unit 117 microphone 118 speaker 119 exterior camera 120 interior camera

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)

Abstract

A voice output device comprising a character string generation unit, a communication unit, and a control unit. The character string generation unit generates a character string for performing voice guidance. When the character string is a first character string including a proper noun, the communication unit transmits the first character string to an external device and receives first voice data corresponding to the first character string from the external device. When the character string is the first character string, the control unit performs control for causing a voice corresponding to the first voice data to be output. Meanwhile, when the character string is a second character string that does not include a proper noun, the control unit performs control for causing a voice corresponding to second voice data to be output, the second voice data being stored in a storage unit as voice data corresponding to the second character string.

Description

音声出力装置、サーバ装置、音声出力方法、制御方法、プログラム及び記憶媒体Audio output device, server device, audio output method, control method, program and storage medium
 本発明は、通信を伴う音声案内において利用可能な技術に関する。 The present invention relates to technology that can be used in voice guidance that accompanies communication.
 車両の目的地までの経路を音声により案内するための構成として、例えば、特許文献1に開示されているような構成が従来知られている。 For example, a configuration disclosed in Patent Literature 1 is conventionally known as a configuration for guiding a route to a destination of a vehicle by voice.
 具体的には、特許文献1には、車両に搭載されているとともに音声案内機能を有する車載装置と、通信網を介して当該車載装置との間で通信を行うことが可能なサーバ装置と、を備えた音声案内システムが開示されている。 Specifically, Patent Document 1 discloses an in-vehicle device that is mounted in a vehicle and has a voice guidance function, a server device that can communicate with the in-vehicle device via a communication network, A voice guidance system is disclosed.
特開2012-173702号公報JP 2012-173702 A
 ここで、特許文献1に開示されているような、通信を伴う音声案内においては、車載装置からの発話頻度に応じ、当該車載装置と、サーバ装置と、の間の通信量が増加する、という問題点がある。 Here, in voice guidance accompanied by communication, as disclosed in Patent Document 1, the amount of communication between the in-vehicle device and the server device increases according to the frequency of speech from the in-vehicle device. There is a problem.
 しかし、特許文献1には、前述の問題点について特に開示等されていない。そのため、特許文献1に開示された構成によれば、前述の問題点に応じた課題が依然として存在している。 However, Patent Document 1 does not particularly disclose the above-mentioned problems. Therefore, according to the configuration disclosed in Patent Document 1, there still exists a problem corresponding to the above-described problem.
 本発明は、上記の課題を解決するためになされたものであり、通信を伴う音声案内において、発話頻度に応じた通信量の増加を抑制することが可能な音声出力装置を提供することを主な目的とする。 The present invention has been made to solve the above problems, and the main object of the present invention is to provide a voice output device capable of suppressing an increase in the amount of communication according to the frequency of utterances in voice guidance involving communication. purpose.
 請求項に記載の発明は、音声出力装置であって、音声案内を行うための文字列を生成する文字列生成部と、前記文字列が固有名詞を含む第1の文字列である場合に、前記第1の文字列を外部装置へ送信し、前記第1の文字列に対応する第1の音声データを前記外部装置から受信する通信部と、前記文字列が前記第1の文字列である場合に、前記第1の音声データに対応する音声を出力させるための制御を行う一方で、前記文字列が固有名詞を含まない第2の文字列である場合に、当該第2の文字列に対応する音声データとして記憶部に格納されている第2の音声データに対応する音声を出力させるための制御を行う制御部と、を有する。 The claimed invention is a voice output device, comprising: a character string generation unit for generating a character string for performing voice guidance; a communication unit that transmits the first character string to an external device and receives first voice data corresponding to the first character string from the external device; and the character string is the first character string. when the character string is a second character string that does not contain a proper noun while performing control for outputting a sound corresponding to the first sound data, the second character string a control unit that performs control for outputting a sound corresponding to the second sound data stored in the storage unit as the corresponding sound data.
 また、請求項に記載の発明は、サーバ装置であって、音声案内を行うために生成された文字列を外部装置から受信する受信部と、前記文字列が固有名詞を含む第1の文字列である場合に、前記第1の文字列に対応する第1の音声データを前記外部装置へ送信させるための制御を行う一方で、前記文字列が固有名詞を含まない第2の文字列である場合に、前記第2の文字列に対応する音声データとして前記外部装置に格納されている第2の音声データに対応する音声を前記外部装置において出力させるための制御を行う制御部と、を有する。 Further, the invention described in the claims is a server device, comprising a receiving unit for receiving a character string generated for performing voice guidance from an external device, and a first character string in which the character string includes a proper noun , the character string is a second character string that does not contain a proper noun while performing control for transmitting first voice data corresponding to the first character string to the external device. a control unit that controls the external device to output a voice corresponding to the second voice data stored in the external device as the voice data corresponding to the second character string in the case of .
 また、請求項に記載の発明は、音声出力方法であって、音声案内を行うための文字列を生成し、前記文字列が固有名詞を含む第1の文字列である場合に、前記第1の文字列を外部装置へ送信し、前記第1の文字列に対応する第1の音声データを前記外部装置から受信し、
 前記文字列が前記第1の文字列である場合に、前記第1の音声データに対応する音声を出力させるための制御を行う一方で、前記文字列が固有名詞を含まない第2の文字列である場合に、当該第2の文字列に対応する音声データとして記憶部に格納されている第2の音声データに対応する音声を出力させるための制御を行う。
Further, the claimed invention is a voice output method, wherein a character string for performing voice guidance is generated, and when the character string is a first character string including a proper noun, the first to an external device, receive first voice data corresponding to the first character string from the external device,
When the character string is the first character string, control is performed to output a voice corresponding to the first voice data, while the second character string does not include a proper noun. , control is performed to output the voice corresponding to the second voice data stored in the storage unit as the voice data corresponding to the second character string.
 また、請求項に記載の発明は、制御方法であって、音声案内を行うために生成された文字列を外部装置から受信し、前記文字列が固有名詞を含む第1の文字列である場合に、前記第1の文字列に対応する第1の音声データを前記外部装置へ送信させるための制御を行う一方で、前記文字列が固有名詞を含まない第2の文字列である場合に、前記第2の文字列に対応する音声データとして前記外部装置に格納されている第2の音声データに対応する音声を前記外部装置において出力させるための制御を行う。 Further, the claimed invention is a control method in which a character string generated for performing voice guidance is received from an external device, and the character string is a first character string including a proper noun. and performing control for transmitting first voice data corresponding to the first character string to the external device, while the character string is a second character string that does not include a proper noun, Control is performed to cause the external device to output a voice corresponding to second voice data stored in the external device as voice data corresponding to the second character string.
 また、請求項に記載の発明は、コンピュータを備える音声出力装置により実行されるプログラムであって、音声案内を行うための文字列を生成する文字列生成部、前記文字列が固有名詞を含む第1の文字列である場合に、前記第1の文字列を外部装置へ送信し、前記第1の文字列に対応する第1の音声データを前記外部装置から受信する通信部、及び、前記文字列が前記第1の文字列である場合に、前記第1の音声データに対応する音声を出力させるための制御を行う一方で、前記文字列が固有名詞を含まない第2の文字列である場合に、当該第2の文字列に対応する音声データとして記憶部に格納されている第2の音声データに対応する音声を出力させるための制御を行う制御部として前記コンピュータを機能させる。 Further, the claimed invention is a program executed by a voice output device having a computer, comprising: a character string generation unit for generating a character string for performing voice guidance; a communication unit configured to transmit the first character string to an external device and receive first voice data corresponding to the first character string from the external device when the character string is 1; When the string is the first character string, control is performed to output a voice corresponding to the first audio data, while the character string is a second character string that does not contain a proper noun. In this case, the computer functions as a control unit that performs control for outputting the sound corresponding to the second sound data stored in the storage unit as the sound data corresponding to the second character string.
 また、請求項に記載の発明は、コンピュータを備えるサーバ装置により実行されるプログラムであって、音声案内を行うために生成された文字列を外部装置から受信する受信部、及び、前記文字列が固有名詞を含む第1の文字列である場合に、前記第1の文字列に対応する第1の音声データを前記外部装置へ送信させるための制御を行う一方で、前記文字列が固有名詞を含まない第2の文字列である場合に、前記第2の文字列に対応する音声データとして前記外部装置に格納されている第2の音声データに対応する音声を前記外部装置において出力させるための制御を行う制御部として前記コンピュータを機能させる。 Further, the claimed invention is a program executed by a server device having a computer, comprising: a receiving unit for receiving a character string generated for performing voice guidance from an external device; If the first character string includes a proper noun, control is performed to transmit first audio data corresponding to the first character string to the external device, while the character string includes a proper noun. for causing the external device to output voice corresponding to second voice data stored in the external device as voice data corresponding to the second character string when the second character string does not include The computer is made to function as a control unit that performs control.
実施例に係る音声出力システムの構成例を示す図。1 is a diagram showing a configuration example of an audio output system according to an embodiment; FIG. 音声出力装置の概略構成を示すブロック図。1 is a block diagram showing a schematic configuration of an audio output device; FIG. サーバ装置の概略構成の一例を示す図。The figure which shows an example of schematic structure of a server apparatus. 第1実施例に係る音声出力システムにおいて行われる処理を説明するためのフローチャート。4 is a flowchart for explaining processing performed in the audio output system according to the first embodiment; 第2実施例に係る音声出力システムにおいて行われる処理を説明するためのフローチャート。9 is a flowchart for explaining processing performed in the audio output system according to the second embodiment;
 本発明の1つの好適な実施形態では、音声出力装置は、音声案内を行うための文字列を生成する文字列生成部と、前記文字列が固有名詞を含む第1の文字列である場合に、前記第1の文字列を外部装置へ送信し、前記第1の文字列に対応する第1の音声データを前記外部装置から受信する通信部と、前記文字列が前記第1の文字列である場合に、前記第1の音声データに対応する音声を出力させるための制御を行う一方で、前記文字列が固有名詞を含まない第2の文字列である場合に、当該第2の文字列に対応する音声データとして記憶部に格納されている第2の音声データに対応する音声を出力させるための制御を行う制御部と、を有する。 In one preferred embodiment of the present invention, the voice output device includes a character string generator for generating a character string for providing voice guidance, and a communication unit for transmitting said first character string to an external device and receiving first voice data corresponding to said first character string from said external device; and said character string being said first character string. In some cases, when the character string is a second character string that does not contain a proper noun while performing control for outputting a voice corresponding to the first voice data, the second character string and a control unit that performs control for outputting a sound corresponding to the second sound data stored in the storage unit as the sound data corresponding to the second sound data.
 上記の音声出力装置は、文字列生成部と、通信部と、制御部と、を有する。文字列生成部は、音声案内を行うための文字列を生成する。通信部は、前記文字列が固有名詞を含む第1の文字列である場合に、前記第1の文字列を外部装置へ送信し、前記第1の文字列に対応する第1の音声データを前記外部装置から受信する。制御部は、前記文字列が前記第1の文字列である場合に、前記第1の音声データに対応する音声を出力させるための制御を行う一方で、前記文字列が固有名詞を含まない第2の文字列である場合に、当該第2の文字列に対応する音声データとして記憶部に格納されている第2の音声データに対応する音声を出力させるための制御を行う。これにより、通信を伴う音声案内において、発話頻度に応じた通信量の増加を抑制することができる。 The above voice output device has a character string generation unit, a communication unit, and a control unit. The character string generator generates a character string for voice guidance. The communication unit, when the character string is a first character string including a proper noun, transmits the first character string to an external device, and transmits first voice data corresponding to the first character string. Receive from the external device. The control unit performs control for outputting a voice corresponding to the first voice data when the character string is the first character string, and performs control for outputting a voice corresponding to the first voice data when the character string is the first voice data. 2 character string, control is performed to output the sound corresponding to the second sound data stored in the storage unit as the sound data corresponding to the second character string. This makes it possible to suppress an increase in the amount of communication according to the frequency of speech in voice guidance involving communication.
 上記の音声出力装置の一態様では、前記制御部は、前記文字列が前記第1の文字列である場合に、前記外部装置から受信した前記第1の音声データをキャッシュデータとして前記記憶部に格納する。 In one aspect of the above voice output device, when the character string is the first character string, the control unit stores the first voice data received from the external device as cache data in the storage unit. Store.
 上記の音声出力装置の一態様では、前記制御部は、前記第1の文字列に対応する前記キャッシュデータが前記記憶部に格納されている場合に、前記外部装置との通信を行わずに、前記キャッシュデータに対応する音声を出力させるための制御を行う。 In one aspect of the above voice output device, when the cache data corresponding to the first character string is stored in the storage unit, the control unit does not communicate with the external device, Control is performed to output sound corresponding to the cache data.
 上記の音声出力装置の一態様では、前記文字列は、少なくとも1つの文を含むスクリプトである。 In one aspect of the above voice output device, the character string is a script including at least one sentence.
 本発明の他の実施形態では、サーバ装置は、音声案内を行うために生成された文字列を外部装置から受信する受信部と、前記文字列が固有名詞を含む第1の文字列である場合に、前記第1の文字列に対応する第1の音声データを前記外部装置へ送信させるための制御を行う一方で、前記文字列が固有名詞を含まない第2の文字列である場合に、前記第2の文字列に対応する音声データとして前記外部装置に格納されている第2の音声データに対応する音声を前記外部装置において出力させるための制御を行う制御部と、を有する。これにより、通信を伴う音声案内において、発話頻度に応じた通信量の増加を抑制することができる。 In another embodiment of the present invention, a server device includes a receiving unit that receives a character string generated for performing voice guidance from an external device; and performing control for transmitting first voice data corresponding to the first character string to the external device, while the character string is a second character string that does not include a proper noun, and a control unit that controls the external device to output a voice corresponding to the second voice data stored in the external device as the voice data corresponding to the second character string. This makes it possible to suppress an increase in the amount of communication according to the frequency of speech in voice guidance involving communication.
 上記のサーバ装置の一態様では、前記制御部は、前記文字列が前記第1の文字列である場合に、前記第1の音声データをキャッシュデータとして記憶部に格納する。 In one aspect of the above server device, the control unit stores the first voice data as cache data in the storage unit when the character string is the first character string.
 上記のサーバ装置の一態様では、前記制御部は、前記第1の文字列に対応する前記キャッシュデータが前記記憶部に格納されている場合に、前記キャッシュデータを前記外部装置へ送信させるための制御を行う。 In one aspect of the above-described server device, the control unit causes the cache data to be transmitted to the external device when the cache data corresponding to the first character string is stored in the storage unit. control.
 上記のサーバ装置の一態様では、前記文字列は、少なくとも1つの文を含むスクリプトである。 In one aspect of the above server device, the character string is a script containing at least one sentence.
 本発明のさらに他の実施形態では、音声出力方法は、音声案内を行うための文字列を生成し、前記文字列が固有名詞を含む第1の文字列である場合に、前記第1の文字列を外部装置へ送信し、前記第1の文字列に対応する第1の音声データを前記外部装置から受信し、前記文字列が前記第1の文字列である場合に、前記第1の音声データに対応する音声を出力させるための制御を行う一方で、前記文字列が固有名詞を含まない第2の文字列である場合に、当該第2の文字列に対応する音声データとして記憶部に格納されている第2の音声データに対応する音声を出力させるための制御を行う。これにより、通信を伴う音声案内において、発話頻度に応じた通信量の増加を抑制することができる。 In still another embodiment of the present invention, the voice output method generates a character string for performing voice guidance, and when the character string is a first character string including a proper noun, the first character transmitting a string to an external device; receiving first audio data corresponding to the first character string from the external device; and, if the character string is the first character string, the first speech While performing control for outputting a voice corresponding to data, when the character string is a second character string that does not contain a proper noun, storing it as voice data corresponding to the second character string in the storage unit Control is performed to output the sound corresponding to the stored second sound data. This makes it possible to suppress an increase in the amount of communication according to the frequency of speech in voice guidance involving communication.
 本発明のさらに他の実施形態では、制御方法は、音声案内を行うために生成された文字列を外部装置から受信し、前記文字列が固有名詞を含む第1の文字列である場合に、前記第1の文字列に対応する第1の音声データを前記外部装置へ送信させるための制御を行う一方で、前記文字列が固有名詞を含まない第2の文字列である場合に、前記第2の文字列に対応する音声データとして前記外部装置に格納されている第2の音声データに対応する音声を前記外部装置において出力させるための制御を行う。これにより、通信を伴う音声案内において、発話頻度に応じた通信量の増加を抑制することができる。 In yet another embodiment of the present invention, the control method receives from an external device a character string generated for providing voice guidance, and if the character string is a first character string including a proper noun, While performing control for transmitting first voice data corresponding to the first character string to the external device, when the character string is a second character string that does not include a proper noun, Control is performed to cause the external device to output the voice corresponding to the second voice data stored in the external device as the voice data corresponding to the second character string. This makes it possible to suppress an increase in the amount of communication according to the frequency of speech in voice guidance involving communication.
 本発明のさらに他の実施形態では、コンピュータを備える音声出力装置により実行されるプログラムは、音声案内を行うための文字列を生成する文字列生成部、前記文字列が固有名詞を含む第1の文字列である場合に、前記第1の文字列を外部装置へ送信し、前記第1の文字列に対応する第1の音声データを前記外部装置から受信する通信部、及び、前記文字列が前記第1の文字列である場合に、前記第1の音声データに対応する音声を出力させるための制御を行う一方で、前記文字列が固有名詞を含まない第2の文字列である場合に、当該第2の文字列に対応する音声データとして記憶部に格納されている第2の音声データに対応する音声を出力させるための制御を行う制御部として前記コンピュータを機能させる。このプログラムをコンピュータで実行することにより、上記の音声出力装置を実現することができる。このプログラムは記憶媒体に記憶して使用することができる。 In still another embodiment of the present invention, a program executed by a voice output device provided with a computer includes a character string generator for generating a character string for providing voice guidance, a first character string including a proper noun, a communication unit that transmits the first character string to an external device and receives first voice data corresponding to the first character string from the external device when the character string is a character string, and When the character string is the first character string, while performing control for outputting a sound corresponding to the first audio data, when the character string is a second character string that does not include a proper noun , the computer functions as a control unit that performs control for outputting the voice corresponding to the second voice data stored in the storage unit as the voice data corresponding to the second character string. By executing this program on a computer, the above audio output device can be realized. This program can be stored in a storage medium and used.
 本発明のさらに他の実施形態では、コンピュータを備えるサーバ装置により実行されるプログラムは、音声案内を行うために生成された文字列を外部装置から受信する受信部、及び、前記文字列が固有名詞を含む第1の文字列である場合に、前記第1の文字列に対応する第1の音声データを前記外部装置へ送信させるための制御を行う一方で、前記文字列が固有名詞を含まない第2の文字列である場合に、前記第2の文字列に対応する音声データとして前記外部装置に格納されている第2の音声データに対応する音声を前記外部装置において出力させるための制御を行う制御部として前記コンピュータを機能させる。このプログラムをコンピュータで実行することにより、上記のサーバ装置を実現することができる。このプログラムは記憶媒体に記憶して使用することができる。 In still another embodiment of the present invention, a program executed by a server device having a computer includes a receiving unit that receives a character string generated for providing voice guidance from an external device, and When the first character string contains a proper noun, the character string does not include a proper noun control for causing the external device to output a voice corresponding to the second voice data stored in the external device as the voice data corresponding to the second character string when the character string is a second character string; The computer is made to function as a control unit for performing. By executing this program on a computer, the above server device can be realized. This program can be stored in a storage medium and used.
 以下、図面を参照して本発明の好適な実施例について説明する。 Preferred embodiments of the present invention will be described below with reference to the drawings.
 <第1実施例>
 まず、第1実施例について説明する。
<First embodiment>
First, the first embodiment will be explained.
 [システム構成]
 (全体構成)
 図1は、実施例に係る音声出力システムの構成例を示す図である。本実施例に係る音声出力システム1は、音声出力装置100と、サーバ装置200とを有する。音声出力装置100は、車両Veに搭載される。サーバ装置200は、複数の車両Veに搭載された複数の音声出力装置100と通信する。
[System configuration]
(overall structure)
FIG. 1 is a diagram illustrating a configuration example of an audio output system according to an embodiment. A voice output system 1 according to this embodiment includes a voice output device 100 and a server device 200 . The audio output device 100 is mounted on the vehicle Ve. The server device 200 communicates with a plurality of audio output devices 100 mounted on a plurality of vehicles Ve.
 音声出力装置100は、基本的に車両Veの搭乗者であるユーザに対して、経路探索処理や経路案内処理などを行う。例えば、音声出力装置100は、ユーザにより目的地等が入力されると、車両Veの位置情報や指定された目的地に関する情報などを含むアップロード信号S1をサーバ装置200に送信する。サーバ装置200は、地図データを参照して目的地までの経路を算出し、目的地までの経路を示す制御信号S2を音声出力装置100へ送信する。音声出力装置100は、受信した制御信号S2に基づいて、音声出力によりユーザに対する経路案内を行う。 The voice output device 100 basically performs route search processing, route guidance processing, etc. for the user who is a passenger of the vehicle Ve. For example, when a destination or the like is input by the user, the voice output device 100 transmits an upload signal S1 including position information of the vehicle Ve and information on the designated destination to the server device 200 . Server device 200 calculates the route to the destination by referring to the map data, and transmits control signal S2 indicating the route to the destination to audio output device 100 . The voice output device 100 provides route guidance to the user by voice output based on the received control signal S2.
 また、音声出力装置100は、ユーザとの対話により各種の情報をユーザに提供する。例えば、音声出力装置100は、ユーザが情報要求を行うと、その情報要求の内容又は種類を示す情報、及び、車両Veの走行状態に関する情報などを含むアップロード信号S1をサーバ装置200に供給する。サーバ装置200は、ユーザが要求する情報を取得、生成し、制御信号S2として音声出力装置100へ送信する。音声出力装置100は、受信した情報を、音声出力によりユーザに提供する。 In addition, the voice output device 100 provides various types of information to the user through interaction with the user. For example, when a user makes an information request, the audio output device 100 supplies the server device 200 with an upload signal S1 including information indicating the content or type of the information request and information about the running state of the vehicle Ve. The server device 200 acquires and generates information requested by the user, and transmits it to the audio output device 100 as a control signal S2. The audio output device 100 provides the received information to the user by audio output.
 (音声出力装置)
 音声出力装置100は、車両Veと共に移動し、案内経路に沿って車両Veが走行するように、音声を主とした経路案内を行う。なお、「音声を主とした経路案内」は、案内経路に沿って車両Veを運転するために必要な情報をユーザが少なくとも音声のみから把握可能な経路案内を指し、音声出力装置100が現在位置周辺の地図などを補助的に表示することを除外するものではない。本実施例では、音声出力装置100は、少なくとも、案内が必要な経路上の地点(「案内地点」とも呼ぶ。)など、運転に係る様々な情報を音声により出力する。ここで、案内地点は、例えば車両Veの右左折を伴う交差点、その他、案内経路に沿って車両Veが走行するために重要な通過地点が該当する。音声出力装置100は、例えば、車両Veから次の案内地点までの距離、当該案内地点での進行方向などの案内地点に関する音声案内を行う。以後では、案内経路に対する案内に関する音声を「経路音声案内」とも呼ぶ。
(Voice output device)
The voice output device 100 moves together with the vehicle Ve and performs route guidance mainly by voice so that the vehicle Ve travels along the guidance route. It should be noted that "route guidance based mainly on voice" refers to route guidance in which the user can grasp information necessary for driving the vehicle Ve along the guidance route at least from only voice, and the voice output device 100 indicates the current position. It does not exclude the auxiliary display of a surrounding map or the like. In this embodiment, the voice output device 100 outputs at least various information related to driving, such as points on the route that require guidance (also referred to as “guidance points”), by voice. Here, the guidance point corresponds to, for example, an intersection at which the vehicle Ve turns right or left, or other passing points important for the vehicle Ve to travel along the guidance route. The voice output device 100 provides voice guidance regarding guidance points such as, for example, the distance from the vehicle Ve to the next guidance point and the traveling direction at the guidance point. Hereinafter, the voice regarding the guidance for the guidance route is also referred to as "route voice guidance".
 音声出力装置100は、例えば車両Veのフロントガラスの上部、又は、ダッシュボード上などに取り付けられる。なお、音声出力装置100は、車両Veに組み込まれてもよい。 The audio output device 100 is installed, for example, on the upper part of the windshield of the vehicle Ve or on the dashboard. Note that the audio output device 100 may be incorporated in the vehicle Ve.
 図2は、音声出力装置100の概略構成を示すブロック図である。音声出力装置100は、主に、通信部111と、記憶部112と、入力部113と、制御部114と、センサ群115と、表示部116と、マイク117と、スピーカ118と、車外カメラ119と、車内カメラ120と、を有する。音声出力装置100内の各要素は、バスライン110を介して相互に接続されている。 FIG. 2 is a block diagram showing a schematic configuration of the audio output device 100. As shown in FIG. The audio output device 100 mainly includes a communication unit 111, a storage unit 112, an input unit 113, a control unit 114, a sensor group 115, a display unit 116, a microphone 117, a speaker 118, and an exterior camera 119. and an in-vehicle camera 120 . Each element in the audio output device 100 is interconnected via a bus line 110 .
 通信部111は、制御部114の制御に基づき、サーバ装置200とのデータ通信を行う。通信部111は、例えば、後述する地図DB(DataBase)4を更新するための地図データをサーバ装置200から受信してもよい。 The communication unit 111 performs data communication with the server device 200 under the control of the control unit 114 . The communication unit 111 may receive, for example, map data for updating a map DB (DataBase) 4 to be described later from the server device 200 .
 記憶部112は、RAM(Random Access Memory)、ROM(Read Only Memory)、不揮発性メモリ(ハードディスクドライブ、フラッシュメモリなどを含む)などの各種のメモリにより構成される。記憶部112には、音声出力装置100が所定の処理を実行するためのプログラムが記憶される。上述のプログラムは、経路案内を音声により行うためのアプリケーションプログラム、音楽を再生するためのアプリケーションプログラム、音楽以外のコンテンツ(テレビ等)を出力するためのアプリケーションプログラムなどを含んでもよい。また、記憶部112は、制御部114の作業メモリとしても使用される。なお、音声出力装置100が実行するプログラムは、記憶部112以外の記憶媒体に記憶されてもよい。 The storage unit 112 is composed of various memories such as RAM (Random Access Memory), ROM (Read Only Memory), and non-volatile memory (including hard disk drive, flash memory, etc.). The storage unit 112 stores a program for the audio output device 100 to execute predetermined processing. The above programs may include an application program for providing route guidance by voice, an application program for playing back music, an application program for outputting content other than music (such as television), and the like. Storage unit 112 is also used as a working memory for control unit 114 . Note that the program executed by the audio output device 100 may be stored in a storage medium other than the storage unit 112 .
 また、記憶部112は、地図データベース(以下、データベースを「DB」と記す。)4を記憶する。地図DB4には、経路案内に必要な種々のデータが記録されている。地図DB4は、例えば、道路網をノードとリンクの組合せにより表した道路データ、及び、目的地、立寄地、又はランドマークの候補となる施設を示す施設データなどを記憶している。地図DB4は、制御部114の制御に基づき、通信部111が地図管理サーバから受信する地図情報に基づき更新されてもよい。 The storage unit 112 also stores a map database (hereinafter, the database is referred to as "DB") 4. Various data required for route guidance are recorded in the map DB 4 . The map DB 4 stores, for example, road data representing a road network by a combination of nodes and links, and facility data indicating facilities that are candidates for destinations, stop-off points, or landmarks. The map DB 4 may be updated based on the map information received by the communication section 111 from the map management server under the control of the control section 114 .
 また、記憶部112には、固有名詞を含まずかつ少なくとも1つの文を含むように予め生成されたスクリプト(文字列)に対応する音声データが格納されている。具体的には、記憶部112の汎用音声データDB112aには、例えば、「このまま道なりに進みます。」、及び、「まもなく信号を左です。左端の車線を進んでください。」等のようなスクリプトに対応する音声データが予め格納されている。また、記憶部112の音声キャッシュデータDB112bには、固有名詞を含みかつ少なくとも1つの文を含むように生成されたスクリプト(文字列)に対応する音声データをキャッシュデータとして格納することができる。固有名詞としては、例えば、地名、インターチェンジ名、交差点名、道路名、及び、ランドマーク名等が挙げられる。 In addition, the storage unit 112 stores voice data corresponding to a pre-generated script (character string) that does not contain proper nouns and contains at least one sentence. Specifically, the general-purpose voice data DB 112a of the storage unit 112 stores, for example, "Go along the road" and "Soon turn left at the traffic light. Please stay in the leftmost lane." Voice data corresponding to the script is stored in advance. In addition, voice data corresponding to a script (character string) generated to include a proper noun and at least one sentence can be stored in the voice cache data DB 112b of the storage unit 112 as cache data. Proper nouns include, for example, place names, interchange names, intersection names, road names, and landmark names.
 入力部113は、ユーザが操作するためのボタン、タッチパネル、リモートコントローラ等である。表示部116は、制御部114の制御に基づき表示を行うディスプレイ等である。マイク117は、車両Veの車内の音声、特に運転手の発話などを集音する。スピーカ118は、運転手などに対して、経路案内のための音声を出力する。 The input unit 113 is a button, touch panel, remote controller, etc. for user operation. The display unit 116 is a display or the like that displays based on the control of the control unit 114 . The microphone 117 collects sounds inside the vehicle Ve, particularly the driver's utterances. A speaker 118 outputs audio for route guidance to the driver or the like.
 センサ群115は、外界センサ121と、内界センサ122とを含む。外界センサ121は、例えば、ライダ、レーダ、超音波センサ、赤外線センサ、ソナーなどの車両Veの周辺環境を認識するための1又は複数のセンサである。内界センサ122は、車両Veの測位を行うセンサであり、例えば、GNSS(Global Navigation Satellite System)受信機、ジャイロセンサ、IMU(Inertial Measurement Unit)、車速センサ、又はこれらの組合せである。なお、センサ群115は、制御部114がセンサ群115の出力から車両Veの位置を直接的に又は間接的に(即ち推定処理を行うことによって)導出可能なセンサを有していればよい。 The sensor group 115 includes an external sensor 121 and an internal sensor 122 . The external sensor 121 is, for example, one or more sensors for recognizing the surrounding environment of the vehicle Ve, such as a lidar, radar, ultrasonic sensor, infrared sensor, and sonar. The internal sensor 122 is a sensor that performs positioning of the vehicle Ve, and is, for example, a GNSS (Global Navigation Satellite System) receiver, a gyro sensor, an IMU (Inertial Measurement Unit), a vehicle speed sensor, or a combination thereof. It should be noted that the sensor group 115 may have a sensor that allows the control unit 114 to directly or indirectly derive the position of the vehicle Ve from the output of the sensor group 115 (that is, by performing estimation processing).
 車外カメラ119は、車両Veの外部を撮影するカメラである。車外カメラ119は、車両の前方を撮影するフロントカメラのみでもよく、フロントカメラに加えて車両の後方を撮影するリアカメラを含んでもよく、車両Veの全周囲を撮影可能な全方位カメラであってもよい。一方、車内カメラ120は、車両Veの車内の様子を撮影するカメラであり、少なくとも運転席周辺を撮影可能な位置に設けられる。 The vehicle exterior camera 119 is a camera that captures the exterior of the vehicle Ve. The exterior camera 119 may be only a front camera that captures the front of the vehicle, or may include a rear camera that captures the rear of the vehicle in addition to the front camera. good too. On the other hand, the in-vehicle camera 120 is a camera for photographing the interior of the vehicle Ve, and is provided at a position capable of photographing at least the vicinity of the driver's seat.
 制御部114は、CPU(Central Processing Unit)、GPU(Graphics Processing Unit)などを含み、音声出力装置100の全体を制御する。例えば、制御部114は、センサ群115の1又は複数のセンサの出力に基づき、車両Veの位置(進行方向の向きも含む)を推定する。また、制御部114は、入力部113又はマイク117により目的地が指定された場合に、当該目的地までの経路である案内経路を示す経路情報を生成し、当該経路情報と推定した車両Veの位置情報と地図DB4とに基づき、経路案内を行う。この場合、制御部114は、経路音声案内をスピーカ118から出力させる。また、制御部114は、表示部116を制御することで、再生中の音楽の情報、映像コンテンツ、又は現在位置周辺の地図などの表示を行う。 The control unit 114 includes a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), etc., and controls the audio output device 100 as a whole. For example, the control unit 114 estimates the position (including the traveling direction) of the vehicle Ve based on the outputs of one or more sensors in the sensor group 115 . Further, when a destination is specified by the input unit 113 or the microphone 117, the control unit 114 generates route information indicating a guidance route to the destination, Based on the positional information and the map DB 4, route guidance is provided. In this case, the control unit 114 causes the speaker 118 to output route voice guidance. Further, the control unit 114 controls the display unit 116 to display information about the music being played, video content, a map of the vicinity of the current position, or the like.
 なお、制御部114が実行する処理は、プログラムによるソフトウェアで実現することに限ることなく、ハードウェア、ファームウェア、及びソフトウェアのうちのいずれかの組み合わせ等により実現してもよい。また、制御部114が実行する処理は、例えばFPGA(field-programmable gate array)又はマイコン等の、ユーザがプログラミング可能な集積回路を用いて実現してもよい。この場合、この集積回路を用いて、制御部114が本実施例において実行するプログラムを実現してもよい。このように、制御部114は、プロセッサ以外のハードウェアにより実現されてもよい。 It should be noted that the processing executed by the control unit 114 is not limited to being implemented by program-based software, and may be implemented by any combination of hardware, firmware, and software. Also, the processing executed by the control unit 114 may be implemented using a user-programmable integrated circuit such as an FPGA (field-programmable gate array) or a microcomputer. In this case, this integrated circuit may be used to implement the program executed by the control unit 114 in this embodiment. Thus, the control unit 114 may be realized by hardware other than the processor.
 図2に示す音声出力装置100の構成は一例であり、図2に示す構成に対して種々の変更がなされてもよい。例えば、地図DB4を記憶部112が記憶する代わりに、制御部114が通信部111を介して経路案内に必要な情報をサーバ装置200から受信してもよい。他の例では、音声出力装置100は、スピーカ118を備える代わりに、音声出力装置100とは別体に構成された音声出力部と電気的に又は公知の通信手段によって接続することで、当該音声出力部から音声を出力させてもよい。この場合、音声出力部は、車両Veに備えられたスピーカであってもよい。さらに別の例では、音声出力装置100は、表示部116を備えなくともよい。この場合、音声出力装置100は、表示に関する制御を全く行わなくともよく、有線又は無線により、車両Ve等に備えられた表示部と電気的に接続することで、当該表示部に所定の表示を実行させてもよい。同様に、音声出力装置100は、センサ群115を備える代わりに、車両Veに備え付けられたセンサが出力する情報を、車両VeからCAN(Controller Area Network)などの通信プロトコルに基づき取得してもよい。 The configuration of the audio output device 100 shown in FIG. 2 is an example, and various changes may be made to the configuration shown in FIG. For example, instead of storing the map DB 4 in the storage unit 112 , the control unit 114 may receive information necessary for route guidance from the server device 200 via the communication unit 111 . In another example, instead of including the speaker 118, the audio output device 100 is electrically connected to an audio output unit configured separately from the audio output device 100, or by a known communication means, so as to output the audio. Audio may be output from the output unit. In this case, the audio output unit may be a speaker provided in the vehicle Ve. In still another example, the audio output device 100 does not have to include the display section 116 . In this case, the audio output device 100 does not need to perform display-related control at all. may be executed. Similarly, instead of including the sensor group 115, the audio output device 100 may acquire information output by sensors installed in the vehicle Ve based on a communication protocol such as CAN (Controller Area Network) from the vehicle Ve. .
 (サーバ装置)
 サーバ装置200は、音声出力装置100から受信する目的地等を含むアップロード信号S1に基づき、車両Veが走行すべき案内経路を示す経路情報を生成する。そして、サーバ装置200は、その後に音声出力装置100が送信するアップロード信号S1が示すユーザの情報要求及び車両Veの走行状態に基づき、ユーザの情報要求に対する情報出力に関する制御信号S2を生成する。そして、サーバ装置200は、生成した制御信号S2を、音声出力装置100に送信する。
(Server device)
The server device 200 generates route information indicating a guidance route that the vehicle Ve should travel based on the upload signal S1 including the destination and the like received from the voice output device 100 . The server device 200 then generates a control signal S2 relating to information output in response to the user's information request based on the user's information request indicated by the upload signal S1 transmitted by the audio output device 100 and the running state of the vehicle Ve. The server device 200 then transmits the generated control signal S<b>2 to the audio output device 100 .
 図3は、サーバ装置200の概略構成の一例を示す図である。サーバ装置200は、主に、通信部211と、記憶部212と、制御部214とを有する。サーバ装置200内の各要素は、バスライン210を介して相互に接続されている。 FIG. 3 is a diagram showing an example of a schematic configuration of the server device 200. As shown in FIG. The server device 200 mainly has a communication section 211 , a storage section 212 and a control section 214 . Each element in the server device 200 is interconnected via a bus line 210 .
 通信部211は、制御部214の制御に基づき、音声出力装置100などの外部装置とのデータ通信を行う。記憶部212は、RAM、ROM、不揮発性メモリ(ハードディスクドライブ、フラッシュメモリなどを含む)などの各種のメモリにより構成される。記憶部212は、サーバ装置200が所定の処理を実行するためのプログラムが記憶される。また、記憶部212は、地図DB4を含んでいる。また、記憶部212には、固有名詞を含みかつ少なくとも1つの文を含むように生成されたスクリプト(文字列)に対応する音声データをキャッシュデータとして格納することが可能な音声キャッシュデータDB212bが設けられている。 The communication unit 211 performs data communication with an external device such as the audio output device 100 under the control of the control unit 214 . The storage unit 212 is composed of various types of memory such as RAM, ROM, nonvolatile memory (including hard disk drive, flash memory, etc.). Storage unit 212 stores a program for server device 200 to execute a predetermined process. Moreover, the memory|storage part 212 contains map DB4. The storage unit 212 is also provided with a voice cache data DB 212b capable of storing, as cache data, voice data corresponding to a script (character string) generated to include a proper noun and at least one sentence. It is
 制御部214は、CPU、GPUなどを含み、サーバ装置200の全体を制御する。また、制御部214は、記憶部212に記憶されたプログラムを実行することで、音声出力装置100とともに動作し、ユーザに対する経路案内処理や情報提供処理などを実行する。例えば、制御部214は、音声出力装置100から通信部211を介して受信するアップロード信号S1に基づき、案内経路を示す経路情報、又は、ユーザの情報要求に対する情報出力に関する制御信号S2を生成する。そして、制御部214は、生成した制御信号S2を、通信部211により音声出力装置100に送信する。 The control unit 214 includes a CPU, GPU, etc., and controls the server device 200 as a whole. Further, the control unit 214 operates together with the audio output device 100 by executing a program stored in the storage unit 212, and executes route guidance processing, information provision processing, and the like for the user. For example, based on the upload signal S1 received from the audio output device 100 via the communication unit 211, the control unit 214 generates route information indicating a guidance route or a control signal S2 relating to information output in response to a user's information request. Then, the control unit 214 transmits the generated control signal S2 to the audio output device 100 through the communication unit 211 .
 [処理フロー]
 次に、音声出力システム1において行われる処理について説明する。図4は、第1実施例に係る音声出力システムにおいて行われる処理を説明するためのフローチャートである。
[Processing flow]
Next, processing performed in the audio output system 1 will be described. FIG. 4 is a flowchart for explaining processing performed in the audio output system according to the first embodiment.
 まず、音声出力装置100の制御部114は、例えば、経路案内を行っている期間中のいずれかのタイミングにおいて、車両Veの運転状況を示す情報を含む運転状況情報を取得する(ステップS11)。 First, the control unit 114 of the voice output device 100 acquires driving situation information including information indicating the driving situation of the vehicle Ve, for example, at any timing during route guidance (step S11).
 運転状況情報には、例えば、車両Veの方位、当該車両Veの速度、当該車両Veの位置の周辺の交通情報(速度規制及び渋滞情報等を含む)、及び、現在時刻等のような、音声出力装置100の各部の機能に基づいて取得可能な少なくとも1つの情報が含まれていればよい。また、運転状況情報には、マイク117により得られた音声、車外カメラ119により撮影された画像、及び、車内カメラ120により撮影された画像のうちのいずれかが含まれていてもよい。また、運転状況情報には、通信部111を通じてサーバ装置200から受信した情報が含まれていてもよい。 The driving situation information includes, for example, the direction of the vehicle Ve, the speed of the vehicle Ve, traffic information around the position of the vehicle Ve (including speed regulation and traffic congestion information, etc.), and voice information such as the current time. At least one piece of information that can be acquired based on the function of each unit of the output device 100 may be included. Further, the driving situation information may include any one of the voice obtained by the microphone 117, the image captured by the exterior camera 119, and the image captured by the interior camera 120. The driving status information may also include information received from the server device 200 through the communication unit 111 .
 次に、制御部114は、ステップS11により取得した運転状況情報に基づき、車両Veの搭乗者に対して音声案内を行うためのスクリプトSC1を生成する(ステップS12)。 Next, the control unit 114 generates a script SC1 for providing voice guidance to passengers of the vehicle Ve based on the driving situation information acquired in step S11 (step S12).
 制御部114は、ステップS12において固有名詞を含まないスクリプトSC1を生成した場合には、当該スクリプトSC1に対応する音声データSD1を記憶部112から取得した(ステップS20)後、当該音声データSD1に応じた音声をスピーカ118から出力させるための制御を行う(ステップS21)。 When script SC1 that does not contain proper nouns is generated in step S12, control unit 114 acquires voice data SD1 corresponding to script SC1 from storage unit 112 (step S20), and Control is performed to output the sound from the speaker 118 (step S21).
 一方、制御部114は、ステップS12において固有名詞を含むスクリプトSC2を生成した場合には、当該スクリプトSC2に対応する音声データがキャッシュデータとして記憶部112に格納されているか否かを確認する(ステップS13)。 On the other hand, when the script SC2 including proper nouns is generated in step S12, the control unit 114 confirms whether or not the voice data corresponding to the script SC2 is stored in the storage unit 112 as cache data (step S13).
 制御部114は、スクリプトSC2に対応する音声データSD2がキャッシュデータCD2として記憶部112に格納されていることを検知した場合には、当該キャッシュデータCD2を記憶部112から取得した(ステップS20)後、当該キャッシュデータCD2に応じた音声をスピーカ118から出力させるための制御を行う(ステップS21)。すなわち、制御部114は、ステップS12において生成した固有名詞を含む文字列に対応するキャッシュデータCD2が記憶部112に格納されている場合に、サーバ装置200との通信を行わずに、当該キャッシュデータCD2に対応する音声を出力させるための制御を行う。 When the control unit 114 detects that the voice data SD2 corresponding to the script SC2 is stored in the storage unit 112 as the cache data CD2, the control unit 114 obtains the cache data CD2 from the storage unit 112 (step S20). , control is performed to output sound corresponding to the cache data CD2 from the speaker 118 (step S21). That is, if the cache data CD2 corresponding to the character string containing the proper noun generated in step S12 is stored in the storage unit 112, the control unit 114 does not communicate with the server device 200, and Control for outputting audio corresponding to CD2 is performed.
 また、制御部114は、スクリプトSC2に対応する音声データSD2がキャッシュデータとして記憶部112に格納されていないことを検知した場合には、当該スクリプトSC2を通信部111からサーバ装置200へ送信させるための制御を行う。このような制御に応じ、通信部111は、スクリプトSC2をサーバ装置200へ送信する(ステップS14)。 Further, when the control unit 114 detects that the voice data SD2 corresponding to the script SC2 is not stored in the storage unit 112 as cache data, the control unit 114 causes the communication unit 111 to transmit the script SC2 to the server device 200. control. According to such control, the communication unit 111 transmits the script SC2 to the server device 200 (step S14).
 サーバ装置200の通信部211は、音声出力装置100から送信されたスクリプトSC2を受信する(ステップS15)。 The communication unit 211 of the server device 200 receives the script SC2 transmitted from the audio output device 100 (step S15).
 サーバ装置200の制御部214は、通信部211において受信したスクリプトSC2に対応する音声データSD2を生成するための処理を行った(ステップS16)後、当該音声データSD2を通信部211から音声出力装置100へ送信させるための制御を行う。このような制御に応じ、通信部211は、音声データSD2を音声出力装置100へ送信する(ステップS17)。 The control unit 214 of the server device 200 performs processing for generating the audio data SD2 corresponding to the script SC2 received by the communication unit 211 (step S16), and then outputs the audio data SD2 from the communication unit 211 to the audio output device. 100 is controlled. According to such control, the communication unit 211 transmits the audio data SD2 to the audio output device 100 (step S17).
 なお、本実施例によれば、スクリプトSC2に対応する音声データSD2の生成に係る処理が、制御部214において行われるものに限らず、例えば、TTS(Text To Speech)等のような音声合成機能を有する外部サーバにおいて行われてもよい。また、このような場合には、例えば、スクリプトSC2に対応する音声データSD2の生成を外部サーバに要求するための処理、及び、当該音声データSD2を当該外部サーバから取得するための処理がステップS16において行われるようにすればよい。 Note that, according to this embodiment, the processing related to the generation of the audio data SD2 corresponding to the script SC2 is not limited to being performed in the control unit 214. For example, a speech synthesis function such as TTS (Text To Speech), etc. may be performed on an external server with In such a case, for example, processing for requesting the external server to generate voice data SD2 corresponding to the script SC2 and processing for acquiring the voice data SD2 from the external server are performed in step S16. It should be done in
 通信部111は、サーバ装置200から送信された音声データSD2を受信する(ステップS18)。 The communication unit 111 receives the audio data SD2 transmitted from the server device 200 (step S18).
 制御部114は、通信部111において受信した音声データSD2をキャッシュデータCD2として記憶部112に格納した(ステップS19)後、当該音声データSD2に応じた音声をスピーカ118から出力させるための制御を行う(ステップS21)。 The control unit 114 stores the audio data SD2 received by the communication unit 111 as the cache data CD2 in the storage unit 112 (step S19), and then controls the speaker 118 to output audio corresponding to the audio data SD2. (Step S21).
 本実施例によれば、制御部114は、文字列生成部としての機能を有する。 According to this embodiment, the control unit 114 has a function as a character string generation unit.
 以上に述べたように、本実施例に係る音声出力装置100は、固有名詞を含まないスクリプト(文字列)に対応する音声データが記憶部112に格納されている場合、及び、固有名詞を含むスクリプト(文字列)に対応するキャッシュデータが記憶部112に格納されている場合において、サーバ装置200との通信を行うことなく音声案内を行うことができる。そのため、本実施例によれば、通信を伴う音声案内において、発話頻度に応じた通信量の増加を抑制することができる。また、本実施例によれば、スクリプトに応じた音声を出力するため、例えば、状況に応じて異なる固有名詞を文中に嵌め込むことにより生成した音声を出力する場合に比べ、聴き取りやすさを向上させることができる。 As described above, the voice output device 100 according to the present embodiment can be used when voice data corresponding to a script (character string) that does not contain proper nouns is stored in the storage unit 112, and when voice data corresponding to scripts (character strings) that do not contain proper nouns When cache data corresponding to the script (character string) is stored in the storage unit 112 , voice guidance can be provided without communicating with the server device 200 . Therefore, according to the present embodiment, it is possible to suppress an increase in the amount of communication according to the frequency of speech in voice guidance involving communication. In addition, according to the present embodiment, since the voice corresponding to the script is output, it is easier to hear than the case of outputting the voice generated by inserting different proper nouns in the sentence depending on the situation. can be improved.
 [変形例]
 以下、上述の実施例に好適な変形例について説明する。
[Modification]
Modifications suitable for the above embodiment will be described below.
 (変形例1)
 制御部114は、記憶部112に格納可能なキャッシュデータ量の上限値CMVを設定してもよい。また、制御部114は、新たなキャッシュデータの格納に伴って記憶部112のキャッシュデータ量が上限値CMVを超えた場合に、最も過去に音声出力されたキャッシュデータを記憶部112から削除するようにしてもよい。また、制御部114は、固有名詞を含むスクリプトのキャッシュデータに利用回数を紐づけておくことにより、利用回数の多いスクリプトを記憶部112から削除しないようにしてもよい。
(Modification 1)
The control unit 114 may set the upper limit value CMV of the amount of cache data that can be stored in the storage unit 112 . Further, when the amount of cache data in the storage unit 112 exceeds the upper limit value CMV due to the storage of new cache data, the control unit 114 deletes from the storage unit 112 the cache data that has been voice-output most recently. can be In addition, the control unit 114 may associate the number of times of use with the cache data of the scripts including proper nouns, so that scripts with a large number of times of use are not deleted from the storage unit 112 .
 (変形例2)
 音声出力装置100(制御部114)は、例えば、記憶部112の記憶容量に制約がある場合等において、固有名詞を含まない多数のスクリプトのうち、音声出力の頻度が中~高頻度のスクリプトに対応する音声データを記憶部112から取得する一方で、音声出力の頻度が低頻度のスクリプトに対応する音声データをサーバ装置200から取得(受信)するようにしてもよい。
(Modification 2)
For example, when the storage capacity of the storage unit 112 is limited, the voice output device 100 (control unit 114) selects a script with medium to high frequency of voice output among many scripts that do not contain proper nouns. While the corresponding voice data is acquired from the storage unit 112 , the voice data corresponding to the script whose voice output frequency is low may be acquired (received) from the server device 200 .
 (変形例3)
 例えば、「目的地までxキロメートル、所要時間はy分です。」等のような、固有名詞を含まないスクリプトについては、実際に想定され得るx及びyの組み合わせに対応するものに限り、音声データが記憶部112に格納されるようにしてもよい。
(Modification 3)
For example, for scripts that do not contain proper nouns, such as "To the destination x kilometers, the required time is y minutes." may be stored in the storage unit 112 .
 <第2実施例>
 次に、第2実施例について説明する。なお、本実施例においては、第1実施例と同様の構成等を適用可能な部分についての説明を適宜省略するとともに、第1実施例とは異なる部分に主眼を置いて説明を行うものとする。具体的には、本実施例においては、第1の実施例と同様のシステム構成を有する一方で、第1の実施例とは異なる処理フローで処理が行われる。そのため、以下においては、本実施例に係る処理フローについての説明を主に行うものとする。
<Second embodiment>
Next, a second embodiment will be described. In addition, in the present embodiment, the description of the portions to which the same configuration as in the first embodiment can be applied will be omitted as appropriate, and the description will focus on the portions different from the first embodiment. . Specifically, in the present embodiment, while having the same system configuration as in the first embodiment, processing is performed according to a processing flow different from that in the first embodiment. Therefore, the processing flow according to the present embodiment will be mainly described below.
 [処理フロー]
 図5は、第2実施例に係る音声出力システムにおいて行われる処理を説明するためのフローチャートである。
[Processing flow]
FIG. 5 is a flow chart for explaining the processing performed in the audio output system according to the second embodiment.
 まず、制御部114は、例えば、経路案内を行っている期間中のいずれかのタイミングにおいて、車両Veの運転状況を示す情報を含む運転状況情報を取得する(ステップS31)。 First, the control unit 114 acquires driving situation information including information indicating the driving situation of the vehicle Ve, for example, at any timing during route guidance (step S31).
 次に、制御部114は、ステップS31により取得した運転状況情報に基づき、車両Veの搭乗者に対して音声案内を行うためのスクリプトを生成し(ステップS32)、当該生成したスクリプトを通信部111からサーバ装置200へ送信させるための制御を行う。このような制御に応じ、通信部111は、ステップS32により生成されたスクリプトをサーバ装置200へ送信する(ステップS33)。 Next, the control unit 114 generates a script for providing voice guidance to the passengers of the vehicle Ve based on the driving situation information acquired in step S31 (step S32), and transmits the generated script to the communication unit 111. to the server device 200. According to such control, the communication unit 111 transmits the script generated in step S32 to the server device 200 (step S33).
 通信部211は、音声出力装置100から送信されたスクリプトを受信する(ステップS34)。 The communication unit 211 receives the script transmitted from the audio output device 100 (step S34).
 制御部214は、ステップS34において、固有名詞が含まれていないスクリプトSC3を受信した場合には、当該スクリプトSC3に対応する音声を出力させるための制御信号CSを通信部211から音声出力装置100へ送信させるための制御を行う。このような制御に応じ、通信部211は、制御信号CSを音声出力装置100へ送信する(ステップS39)。すなわち、制御部214は、通信部211において受信された文字列が固有名詞を含まない文字列である場合に、当該文字列に対応する音声データとして音声出力装置100(記憶部112)に格納されている音声データに対応する音声を音声出力装置100において出力させるための制御を行う。 When the control unit 214 receives the script SC3 that does not include a proper noun in step S34, the control unit 214 transmits the control signal CS for outputting the voice corresponding to the script SC3 from the communication unit 211 to the voice output device 100. Perform control for transmission. According to such control, the communication unit 211 transmits the control signal CS to the audio output device 100 (step S39). That is, when the character string received by communication unit 211 is a character string that does not contain a proper noun, control unit 214 stores the character string in audio output device 100 (storage unit 112) as audio data corresponding to the character string. It controls the audio output device 100 to output the audio corresponding to the audio data stored therein.
 制御部214は、ステップS34において、固有名詞が含まれているスクリプトSC4を受信した場合には、当該スクリプトSC4に対応する音声データがキャッシュデータとして記憶部212に格納されているか否かを確認する(ステップS35)。 In step S34, when the script SC4 including a proper noun is received, the control unit 214 confirms whether or not the voice data corresponding to the script SC4 is stored in the storage unit 212 as cache data. (Step S35).
 制御部214は、スクリプトSC4に対応する音声データSD4がキャッシュデータCD4として記憶部212に格納されていることを検知した場合には、当該キャッシュデータCD4を記憶部212から取得した(ステップS38)後、当該キャッシュデータCD4を通信部211から音声出力装置100へ送信させるための制御を行う。このような制御に応じ、通信部211は、キャッシュデータCD4を音声出力装置100へ送信する(ステップS39)。 When the control unit 214 detects that the voice data SD4 corresponding to the script SC4 is stored in the storage unit 212 as the cache data CD4, the control unit 214 acquires the cache data CD4 from the storage unit 212 (step S38). , the control for transmitting the cache data CD4 from the communication unit 211 to the audio output device 100 is performed. According to such control, the communication unit 211 transmits the cache data CD4 to the audio output device 100 (step S39).
 制御部214は、スクリプトSC4に対応する音声データSD4がキャッシュデータとして記憶部212に格納されていないことを検知した場合には、当該音声データSD4を生成するための処理を行った(ステップS36)後、当該音声データSD4をキャッシュデータCD4として記憶部212に格納する(ステップS37)。そして、制御部214は、音声データSD4を通信部211から音声出力装置100へ送信させるための制御を行う。このような制御に応じ、通信部211は、音声データSD4を音声出力装置100へ送信する(ステップS39)。 When the control unit 214 detects that the audio data SD4 corresponding to the script SC4 is not stored in the storage unit 212 as cache data, it performs processing for generating the audio data SD4 (step S36). After that, the audio data SD4 is stored in the storage unit 212 as cache data CD4 (step S37). Then, the control unit 214 performs control for transmitting the audio data SD4 from the communication unit 211 to the audio output device 100. FIG. According to such control, the communication unit 211 transmits the audio data SD4 to the audio output device 100 (step S39).
 通信部111は、サーバ装置200から送信された音声データSD4、キャッシュデータCD4、または、制御信号CSを受信する(ステップS40)。 The communication unit 111 receives the audio data SD4, the cache data CD4, or the control signal CS transmitted from the server device 200 (step S40).
 制御部114は、通信部111が音声データSD4を受信した場合には、当該音声データSD4に応じた音声をスピーカ118から出力させるための制御を行う(ステップS41)。 When the communication unit 111 receives the audio data SD4, the control unit 114 performs control to output the audio corresponding to the audio data SD4 from the speaker 118 (step S41).
 また、制御部114は、通信部111がキャッシュデータCD4を受信した場合には、当該キャッシュデータCD4に応じた音声をスピーカ118から出力させるための制御を行う(ステップS41)。 Further, when the communication unit 111 receives the cache data CD4, the control unit 114 performs control for outputting the sound corresponding to the cache data CD4 from the speaker 118 (step S41).
 一方、制御部114は、通信部111が制御信号CSを受信した場合には、ステップS32により生成したスクリプトSC3に対応する音声データSD3を記憶部112から取得した後、当該音声データSD3に応じた音声をスピーカ118から出力させるための制御を行う(ステップS41)。 On the other hand, when the communication unit 111 receives the control signal CS, the control unit 114 acquires the voice data SD3 corresponding to the script SC3 generated in step S32 from the storage unit 112, and then responds to the voice data SD3. Control is performed to output sound from the speaker 118 (step S41).
 本実施例によれば、通信部211は、受信部としての機能を有する。 According to this embodiment, the communication unit 211 has a function as a receiving unit.
 以上に述べたように、本実施例に係るサーバ装置200は、音声案内用のスクリプト(文字列)として固有名詞を含まないスクリプト(文字列)を音声出力装置100から受信した場合に、音声データを送信する代わりに、記憶部112に格納されている音声データに応じた音声を出力させるための制御信号を送信する。そのため、本実施例によれば、通信を伴う音声案内において、発話頻度に応じた通信量の増加を抑制することができる。また、本実施例によれば、キャッシュデータを利用することにより、音声生成に係るサーバ負荷を軽減することができる。 As described above, when the server device 200 according to the present embodiment receives a script (character string) that does not include a proper noun as a script (character string) for voice guidance from the voice output device 100, the voice data is transmitted, a control signal for outputting a sound corresponding to the sound data stored in the storage unit 112 is transmitted. Therefore, according to the present embodiment, it is possible to suppress an increase in the amount of communication according to the frequency of speech in voice guidance involving communication. Moreover, according to the present embodiment, by using cache data, it is possible to reduce the server load related to voice generation.
 なお、上述した各実施例において、プログラムは、様々なタイプの非一時的なコンピュータ可読媒体(non-transitory computer readable medium)を用いて格納され、コンピュータである制御部等に供給することができる。非一時的なコンピュータ可読媒体は、様々なタイプの実体のある記憶媒体(tangible storage medium)を含む。非一時的なコンピュータ可読媒体の例は、磁気記憶媒体(例えばフレキシブルディスク、磁気テープ、ハードディスクドライブ)、光磁気記憶媒体(例えば光磁気ディスク)、CD-ROM(Read Only Memory)、CD-R、CD-R/W、半導体メモリ(例えば、マスクROM、PROM(Programmable ROM)、EPROM(Erasable PROM)、フラッシュROM、RAM(Random Access Memory))を含む。 In addition, in each of the above-described embodiments, the program can be stored using various types of non-transitory computer readable media and supplied to a control unit or the like that is a computer. Non-transitory computer readable media include various types of tangible storage media. Examples of non-transitory computer-readable media include magnetic storage media (e.g., floppy disks, magnetic tapes, hard disk drives), magneto-optical storage media (e.g., magneto-optical discs), CD-ROMs (Read Only Memory), CD-Rs, CD-R/W, semiconductor memory (eg mask ROM, PROM (Programmable ROM), EPROM (Erasable PROM), flash ROM, RAM (Random Access Memory)).
 以上、実施形態を参照して本願発明を説明したが、本願発明は上記実施形態に限定されるものではない。本願発明の構成や詳細には、本願発明のスコープ内で当業者が理解し得る様々な変更をすることができる。すなわち、本願発明は、請求の範囲を含む全開示、技術的思想にしたがって当業者であればなし得るであろう各種変形、修正を含むことは勿論である。また、引用した上記の特許文献等の各開示は、本書に引用をもって繰り込むものとする。 Although the present invention has been described with reference to the embodiments, the present invention is not limited to the above embodiments. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention. That is, the present invention naturally includes various variations and modifications that a person skilled in the art can make according to the entire disclosure including the scope of claims and technical ideas. In addition, the disclosures of the cited patent documents and the like are incorporated herein by reference.
 100 音声出力装置
 200 サーバ装置
 111、211 通信部
 112、212 記憶部
 113 入力部
 114、214 制御部
 115 センサ群
 116 表示部
 117 マイク
 118 スピーカ
 119 車外カメラ
 120 車内カメラ
100 audio output device 200 server device 111, 211 communication unit 112, 212 storage unit 113 input unit 114, 214 control unit 115 sensor group 116 display unit 117 microphone 118 speaker 119 exterior camera 120 interior camera

Claims (13)

  1.  音声案内を行うための文字列を生成する文字列生成部と、
     前記文字列が固有名詞を含む第1の文字列である場合に、前記第1の文字列を外部装置へ送信し、前記第1の文字列に対応する第1の音声データを前記外部装置から受信する通信部と、
     前記文字列が前記第1の文字列である場合に、前記第1の音声データに対応する音声を出力させるための制御を行う一方で、前記文字列が固有名詞を含まない第2の文字列である場合に、当該第2の文字列に対応する音声データとして記憶部に格納されている第2の音声データに対応する音声を出力させるための制御を行う制御部と、
     を有する音声出力装置。
    a character string generation unit that generates a character string for performing voice guidance;
    when the character string is a first character string including a proper noun, the first character string is transmitted to an external device, and first voice data corresponding to the first character string is transmitted from the external device a receiving communication unit;
    When the character string is the first character string, control is performed to output a voice corresponding to the first voice data, while the second character string does not include a proper noun. a control unit for controlling to output a sound corresponding to the second sound data stored in the storage unit as the sound data corresponding to the second character string when
    an audio output device having
  2.  前記制御部は、前記文字列が前記第1の文字列である場合に、前記外部装置から受信した前記第1の音声データをキャッシュデータとして前記記憶部に格納する請求項1に記載の音声出力装置。 2. The audio output according to claim 1, wherein the control unit stores the first audio data received from the external device as cache data in the storage unit when the character string is the first character string. Device.
  3.  前記制御部は、前記第1の文字列に対応する前記キャッシュデータが前記記憶部に格納されている場合に、前記外部装置との通信を行わずに、前記キャッシュデータに対応する音声を出力させるための制御を行う請求項2に記載の音声出力装置。 The control unit outputs a voice corresponding to the cache data without communicating with the external device when the cache data corresponding to the first character string is stored in the storage unit. 3. The audio output device according to claim 2, wherein the control for
  4.  前記文字列は、少なくとも1つの文を含むスクリプトである請求項1乃至3のいずれか一項に記載の音声出力装置。 The voice output device according to any one of claims 1 to 3, wherein the character string is a script containing at least one sentence.
  5.  音声案内を行うために生成された文字列を外部装置から受信する受信部と、
     前記文字列が固有名詞を含む第1の文字列である場合に、前記第1の文字列に対応する第1の音声データを前記外部装置へ送信させるための制御を行う一方で、前記文字列が固有名詞を含まない第2の文字列である場合に、前記第2の文字列に対応する音声データとして前記外部装置に格納されている第2の音声データに対応する音声を前記外部装置において出力させるための制御を行う制御部と、
     を有するサーバ装置。
    a receiving unit that receives a character string generated for performing voice guidance from an external device;
    controlling transmission of first voice data corresponding to the first character string to the external device when the character string is a first character string including a proper noun; is a second character string that does not contain a proper noun, the external device reproduces a speech corresponding to the second speech data stored in the external device as speech data corresponding to the second character string A control unit that performs control for outputting,
    A server device having
  6.  前記制御部は、前記文字列が前記第1の文字列である場合に、前記第1の音声データをキャッシュデータとして記憶部に格納する請求項5に記載のサーバ装置。 The server device according to claim 5, wherein the control unit stores the first voice data as cache data in the storage unit when the character string is the first character string.
  7.  前記制御部は、前記第1の文字列に対応する前記キャッシュデータが前記記憶部に格納されている場合に、前記キャッシュデータを前記外部装置へ送信させるための制御を行う請求項6に記載のサーバ装置。 7. The control unit according to claim 6, wherein when the cache data corresponding to the first character string is stored in the storage unit, the control unit performs control to transmit the cache data to the external device. Server device.
  8.  前記文字列は、少なくとも1つの文を含むスクリプトである請求項5乃至7のいずれか一項に記載のサーバ装置。 The server device according to any one of claims 5 to 7, wherein the character string is a script containing at least one sentence.
  9.  音声案内を行うための文字列を生成し、
     前記文字列が固有名詞を含む第1の文字列である場合に、前記第1の文字列を外部装置へ送信し、前記第1の文字列に対応する第1の音声データを前記外部装置から受信し、
     前記文字列が前記第1の文字列である場合に、前記第1の音声データに対応する音声を出力させるための制御を行う一方で、前記文字列が固有名詞を含まない第2の文字列である場合に、当該第2の文字列に対応する音声データとして記憶部に格納されている第2の音声データに対応する音声を出力させるための制御を行う音声出力方法。
    Generate a character string for voice guidance,
    when the character string is a first character string including a proper noun, the first character string is transmitted to an external device, and first voice data corresponding to the first character string is transmitted from the external device receive and
    When the character string is the first character string, control is performed to output a voice corresponding to the first voice data, while the second character string does not include a proper noun. , a voice output method for performing control for outputting voice corresponding to second voice data stored in a storage unit as voice data corresponding to the second character string.
  10.  音声案内を行うために生成された文字列を外部装置から受信し、
     前記文字列が固有名詞を含む第1の文字列である場合に、前記第1の文字列に対応する第1の音声データを前記外部装置へ送信させるための制御を行う一方で、前記文字列が固有名詞を含まない第2の文字列である場合に、前記第2の文字列に対応する音声データとして前記外部装置に格納されている第2の音声データに対応する音声を前記外部装置において出力させるための制御を行う制御方法。
    receiving a character string generated for performing voice guidance from an external device;
    controlling transmission of first voice data corresponding to the first character string to the external device when the character string is a first character string including a proper noun; is a second character string that does not contain a proper noun, the external device reproduces a speech corresponding to the second speech data stored in the external device as speech data corresponding to the second character string Control method for controlling output.
  11.  コンピュータを備える音声出力装置により実行されるプログラムであって、
     音声案内を行うための文字列を生成する文字列生成部、
     前記文字列が固有名詞を含む第1の文字列である場合に、前記第1の文字列を外部装置へ送信し、前記第1の文字列に対応する第1の音声データを前記外部装置から受信する通信部、及び、
     前記文字列が前記第1の文字列である場合に、前記第1の音声データに対応する音声を出力させるための制御を行う一方で、前記文字列が固有名詞を含まない第2の文字列である場合に、当該第2の文字列に対応する音声データとして記憶部に格納されている第2の音声データに対応する音声を出力させるための制御を行う制御部として前記コンピュータを機能させるプログラム。
    A program executed by an audio output device comprising a computer,
    a character string generation unit that generates a character string for performing voice guidance;
    when the character string is a first character string including a proper noun, the first character string is transmitted to an external device, and first voice data corresponding to the first character string is transmitted from the external device a receiving communication unit, and
    When the character string is the first character string, control is performed to output a voice corresponding to the first voice data, while the second character string does not include a proper noun. A program for causing the computer to function as a control unit that performs control for outputting the sound corresponding to the second sound data stored in the storage unit as the sound data corresponding to the second character string when .
  12.  コンピュータを備えるサーバ出力装置により実行されるプログラムであって、
     音声案内を行うために生成された文字列を外部装置から受信する受信部、及び、
     前記文字列が固有名詞を含む第1の文字列である場合に、前記第1の文字列に対応する第1の音声データを前記外部装置へ送信させるための制御を行う一方で、前記文字列が固有名詞を含まない第2の文字列である場合に、前記第2の文字列に対応する音声データとして前記外部装置に格納されている第2の音声データに対応する音声を前記外部装置において出力させるための制御を行う制御部として前記コンピュータを機能させるプログラム。
    A program executed by a server output device comprising a computer, comprising:
    a receiving unit that receives a character string generated for voice guidance from an external device;
    controlling transmission of first voice data corresponding to the first character string to the external device when the character string is a first character string including a proper noun; is a second character string that does not contain a proper noun, the external device reproduces a speech corresponding to the second speech data stored in the external device as speech data corresponding to the second character string A program that causes the computer to function as a control unit that controls output.
  13.  請求項11又は12に記載のプログラムを記憶した記憶媒体。 A storage medium storing the program according to claim 11 or 12.
PCT/JP2021/040103 2021-10-29 2021-10-29 Voice output device, server device, voice output method, control method, program, and storage medium WO2023073949A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2021/040103 WO2023073949A1 (en) 2021-10-29 2021-10-29 Voice output device, server device, voice output method, control method, program, and storage medium
JP2023556054A JPWO2023073949A1 (en) 2021-10-29 2021-10-29

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/040103 WO2023073949A1 (en) 2021-10-29 2021-10-29 Voice output device, server device, voice output method, control method, program, and storage medium

Publications (1)

Publication Number Publication Date
WO2023073949A1 true WO2023073949A1 (en) 2023-05-04

Family

ID=86157627

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/040103 WO2023073949A1 (en) 2021-10-29 2021-10-29 Voice output device, server device, voice output method, control method, program, and storage medium

Country Status (2)

Country Link
JP (1) JPWO2023073949A1 (en)
WO (1) WO2023073949A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH116743A (en) * 1997-04-22 1999-01-12 Toyota Motor Corp Mobile terminal device and voice output system for it
JP2004170887A (en) * 2002-11-22 2004-06-17 Canon Inc Data processing system and data storing method
JP2011033764A (en) * 2009-07-31 2011-02-17 Hitachi Ltd Voice read system and voice read terminal
JP2012173702A (en) * 2011-02-24 2012-09-10 Denso Corp Voice guidance system
JP2012194284A (en) * 2011-03-15 2012-10-11 Toshiba Corp Voice conversion supporting device, program and voice conversion supporting method
US20150073770A1 (en) * 2013-09-10 2015-03-12 At&T Intellectual Property I, L.P. System and method for intelligent language switching in automated text-to-speech systems

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH116743A (en) * 1997-04-22 1999-01-12 Toyota Motor Corp Mobile terminal device and voice output system for it
JP2004170887A (en) * 2002-11-22 2004-06-17 Canon Inc Data processing system and data storing method
JP2011033764A (en) * 2009-07-31 2011-02-17 Hitachi Ltd Voice read system and voice read terminal
JP2012173702A (en) * 2011-02-24 2012-09-10 Denso Corp Voice guidance system
JP2012194284A (en) * 2011-03-15 2012-10-11 Toshiba Corp Voice conversion supporting device, program and voice conversion supporting method
US20150073770A1 (en) * 2013-09-10 2015-03-12 At&T Intellectual Property I, L.P. System and method for intelligent language switching in automated text-to-speech systems

Also Published As

Publication number Publication date
JPWO2023073949A1 (en) 2023-05-04

Similar Documents

Publication Publication Date Title
JP6604151B2 (en) Speech recognition control system
JPH11201770A (en) Navigation device
US10884700B2 (en) Sound outputting device, sound outputting method, and sound outputting program storage medium
JP2023105143A (en) Information processor, method for outputting information, program, and recording medium
WO2023073949A1 (en) Voice output device, server device, voice output method, control method, program, and storage medium
WO2021192511A1 (en) Information processing device, information output method, program and storage medium
WO2023063405A1 (en) Content generation device, content generation method, program, and storage medium
WO2023286827A1 (en) Content output device, content output method, program, and storage medium
WO2023286826A1 (en) Content output device, content output method, program, and storage medium
WO2023163047A1 (en) Terminal device, information providing system, information processing method, program, and storage medium
US20240134596A1 (en) Content output device, content output method, program and storage medium
WO2023062816A1 (en) Content output device, content output method, program, and storage medium
WO2023162189A1 (en) Content output device, content output method, program, and storage medium
WO2023163197A1 (en) Content evaluation device, content evaluation method, program, and storage medium
WO2023112147A1 (en) Voice output device, voice output method, program, and storage medium
WO2023163045A1 (en) Content output device, content output method, program, and storage medium
JP7153191B2 (en) Information provision device and in-vehicle device
WO2023073856A1 (en) Audio output device, audio output method, program, and storage medium
WO2023163196A1 (en) Content output device, content output method, program, and recording medium
WO2023162192A1 (en) Content output device, content output method, program, and recording medium
JP2023011136A (en) Content output device, method for outputting content, program, and recording medium
WO2023112148A1 (en) Audio output device, audio output method, program, and storage medium
WO2023062817A1 (en) Voice recognition device, control method, program, and storage medium
JP2023012733A (en) Content generator, method for generating content, program, and recording medium
WO2023073912A1 (en) Voice output device, voice output method, program, and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21962490

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2023556054

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21962490

Country of ref document: EP

Kind code of ref document: A1