WO2016129276A1 - Information dissemination method, server, information terminal device, system, and voice interaction system - Google Patents

Information dissemination method, server, information terminal device, system, and voice interaction system Download PDF

Info

Publication number
WO2016129276A1
WO2016129276A1 PCT/JP2016/000687 JP2016000687W WO2016129276A1 WO 2016129276 A1 WO2016129276 A1 WO 2016129276A1 JP 2016000687 W JP2016000687 W JP 2016000687W WO 2016129276 A1 WO2016129276 A1 WO 2016129276A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
voice
condition
server
terminal device
Prior art date
Application number
PCT/JP2016/000687
Other languages
French (fr)
Japanese (ja)
Inventor
勝長 辻
Original Assignee
パナソニックIpマネジメント株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by パナソニックIpマネジメント株式会社 filed Critical パナソニックIpマネジメント株式会社
Priority to JP2016574669A priority Critical patent/JP6846617B2/en
Publication of WO2016129276A1 publication Critical patent/WO2016129276A1/en

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/26Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
    • G01C21/34Route searching; Route guidance
    • G01C21/36Input/output arrangements for on-board computers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers

Definitions

  • the present disclosure relates to an information providing method in a voice dialogue system that interacts with a user based on voice from a user who is on a moving body.
  • a speech dialogue apparatus that allows a user to give a command to the apparatus in a natural language by combining speech recognition, dialogue processing, and speech synthesis is widely known.
  • a dialogue technology it is not limited to a single round-trip response, but it is possible to realize more natural conversations by holding and using information obtained by repeatedly responding to a single topic, and supporting multiple topics.
  • An interactive system has been proposed (see, for example, Patent Document 1). As a result, it is possible to realize a complex conversation that interrupts a conversation of another topic during a conversation of a certain topic.
  • a dialogue scenario important when constructing a complicated dialogue as described above ( It is thought that maintenance of the part that describes the response contents of the dialogue can be easily performed and expandability is also improved.
  • Patent Document 2 Also, a system for performing a push-type utterance as in Patent Document 2 has been proposed.
  • This disclosure is intended to provide an information providing method capable of reducing the amount of database held by the client device and reducing the amount of data transmitted between the server and the client device.
  • an information providing method is an information providing method in a server included in a voice dialogue system that interacts with a user based on voice from a user who is on a moving body. is there.
  • An information providing method is included in a voice interaction system, and when trigger information indicating a condition of a moving object or a user is transmitted to an information terminal device that acquires voice and provides voice information to a user, and satisfies the condition
  • a voice information transmission step to be transmitted to the information terminal device.
  • An information providing method is included in a voice dialogue system that interacts with a user based on voice from a user who is on a moving body, and the information provision method in an information terminal device that acquires voice It is.
  • the information providing method includes a trigger information receiving step for receiving trigger information indicating a condition of a moving object or a user from a server included in the voice interactive system, a determination step for determining whether the condition is satisfied, and a condition When the condition is satisfied, a transmission step of transmitting a notification signal to the server, a voice information reception step of receiving the voice information generated by the voice interaction processing when the condition is satisfied from the server, and the voice information A providing step to provide to the user.
  • the information providing method is an information providing method in a voice dialogue system that interacts with a user based on voice from a user who is on a moving body.
  • the voice interactive system includes an information terminal device that acquires voice and a server, and the information providing method includes trigger information transmission in which the server transmits trigger information indicating a condition of a moving object or a user to the information terminal device.
  • a step for determining whether or not the condition is satisfied when the information terminal device receives the trigger information, and a transmission for transmitting a notification signal to the server when the condition is satisfied A step of generating voice information by performing a voice interaction process when the condition is satisfied when the notification signal is received by the server, and a voice by which the server transmits the voice information to the information terminal device
  • the information transmitting step may include a providing step in which the information terminal device receives the voice information and provides the voice information to the user.
  • This disclosure provides an information providing method capable of reducing the amount of database held by the client device and reducing the amount of data transmitted between the server and the client device.
  • the figure which shows the whole structure of the voice dialogue system in embodiment Block diagram showing a configuration of a client device according to an embodiment
  • movement of the speech dialogue system in embodiment Flowchart showing the operation of the client system in the embodiment
  • movement of the server in embodiment The figure which shows an example of operation
  • movement of the speech dialogue system in embodiment The figure which shows an example of operation
  • combination character string template in embodiment The figure which shows an example of the trigger template in embodiment
  • movement of the speech dialogue system in embodiment Flowchart showing
  • an utterance is made from a voice dialogue system based on information that changes sequentially such as the state of the vehicle (for example, the position or moving speed of the vehicle). Therefore, when all interactive processes are performed by the server, it is necessary to sequentially transmit these pieces of information detected by the client device to the server. This causes a problem that the amount of data transmitted between the server and the client device increases.
  • a predetermined behavior when processing for realizing such a push-type dialogue is performed in the client device, a predetermined behavior must be incorporated into the client device as a database. Specifically, there are an infinite number of such triggers, and an enormous database is required to determine combinations of conditions. There is also a problem that the processing amount of the client device increases. In particular, when it is incorporated into a server / client type voice interactive system, the client device must have a huge database, and there is a problem that the advantages (maintenability and expandability) of the server / client type cannot be utilized.
  • this embodiment performs push-type conversations only by providing the minimum necessary information from the server without having a huge database in the client in the server-client type voice interaction apparatus. And a spoken dialogue system capable of performing flexible behavior according to the utterance from the user.
  • the information providing method is an information providing method in a server included in a voice interaction system that interacts with a user based on a voice from a user who is on a moving body.
  • An information providing method is included in a voice interaction system, and when trigger information indicating a condition of a moving object or a user is transmitted to an information terminal device that acquires voice and provides voice information to a user, and satisfies the condition
  • a voice information transmission step to be transmitted to the information terminal device.
  • the server since the client device only determines the condition, it is not necessary for the client device to hold a huge database for interactive processing. In addition, an increase in the processing amount in the client device can be suppressed.
  • the condition may include a condition of a moving body or a user acquired by a sensor included in the moving body or the information terminal device.
  • the condition may include a condition of the position of the moving object.
  • the condition may include a condition of a moving distance of the moving body.
  • the notification signal includes information indicating the priority of the voice conversation processing based on the notification signal.
  • the voice conversation processing currently being executed and the voice conversation processing based on the notification signal are performed based on the priority.
  • a voice interaction process with a high priority may be executed.
  • the server can immediately grasp the processing priority using the priority included in the notification signal.
  • the notification signal includes information indicating the transition destination of the state transition in the voice interaction processing based on the notification signal.
  • the state is transitioned to the transition destination, and voice information is generated based on the state after the transition. May be.
  • the server can immediately execute the dialogue process using the information of the transition destination included in the notification signal.
  • An information providing method is included in a voice dialogue system that interacts with a user based on voice from a user who is on a moving body, and the information provision method in an information terminal device that acquires voice It is.
  • the information providing method includes a trigger information receiving step for receiving trigger information indicating a condition of a moving object or a user from a server included in the voice interactive system, a determination step for determining whether the condition is satisfied, and a condition When the condition is satisfied, a transmission step of transmitting a notification signal to the server, a voice information reception step of receiving the voice information generated by the voice interaction processing when the condition is satisfied from the server, and the voice information Providing steps to the user.
  • the server since the client device only determines the condition, it is not necessary for the client device to hold a huge database for interactive processing. In addition, an increase in the processing amount in the client device can be suppressed.
  • the information providing method is an information providing method in a voice dialogue system that interacts with a user based on voice from a user who is on a moving body.
  • the voice interactive system includes an information terminal device that acquires voice and a server, and the information providing method includes trigger information transmission in which the server transmits trigger information indicating a condition of a moving object or a user to the information terminal device.
  • a step for determining whether or not the condition is satisfied when the information terminal device receives the trigger information, and a transmission for transmitting a notification signal to the server when the condition is satisfied A step of generating voice information by performing a voice interaction process when the condition is satisfied when the notification signal is received by the server, and a voice by which the server transmits the voice information to the information terminal device An information transmission step, and an information terminal device receiving the voice information and providing the user with the voice information.
  • the server since the client device only determines the condition, it is not necessary for the client device to hold a huge database for interactive processing. In addition, an increase in the processing amount in the client device can be suppressed.
  • a server is a server included in a voice interaction system that interacts with a user based on a voice from a user who is on a moving body, the information terminal device, and a trigger information transmission unit And a generation unit and a voice information transmission unit.
  • the information terminal device is included in the voice interaction system, acquires voice, and provides voice information to the user.
  • a trigger information transmission part transmits the trigger information which shows the conditions of a mobile body or a user's state to an information terminal device, and requests
  • the generation unit receives the notification signal, the generation unit generates voice information by performing a voice interaction process when the condition is satisfied.
  • the voice information transmitting unit transmits voice information to the information terminal device.
  • an information terminal device is included in a voice interaction system that interacts with a user based on a voice from a user boarding a moving body, and includes a trigger information reception unit, a determination unit, and a transmission Unit, a voice information receiving unit, and a providing unit.
  • the Information terminal device acquires voice.
  • the trigger information receiving unit receives trigger information indicating a condition of a moving object or a user from a server included in the voice interactive system.
  • the determination unit determines whether or not the condition is satisfied.
  • the transmission unit transmits a notification signal to the server when the condition is satisfied.
  • the voice information receiving unit receives voice information generated by the voice interaction process when the condition is satisfied from the server.
  • the providing unit provides audio information to the user.
  • a voice interaction system is a voice interaction system that interacts with a user based on voice from a user who is on a moving body, obtains the voice, and provides voice information to the user
  • An information terminal device and a server includes an information transmission unit, a generation unit, and a voice information transmission unit
  • the information terminal device includes a trigger information reception unit, a determination unit, a transmission unit, A voice information receiving unit and a providing unit are provided.
  • the trigger information transmitting unit transmits to the information terminal device trigger information indicating the condition of the mobile object or the user, and requests to transmit a notification signal when the condition is satisfied.
  • the generation unit receives the notification signal, the generation unit generates voice information by performing a voice interaction process when the condition is satisfied.
  • the voice information transmitting unit transmits voice information to the information terminal device.
  • the trigger information receiving unit receives trigger information from the server.
  • the determination unit determines whether or not the condition is satisfied.
  • the transmission unit transmits a notification signal to the server when the condition is satisfied.
  • the voice information receiving unit receives voice information from the server.
  • the providing unit provides audio information to the user.
  • the voice dialogue system in the voice dialogue system according to the present embodiment, basically, dialogue processing for voice dialogue is performed in the server. Further, when a process for determining the condition of the vehicle information occurs in the dialogue process, the condition is notified to the client device. The client device determines whether the vehicle information satisfies the above condition and notifies the server of the determination result. Thus, since only the determination of the condition is performed by the client device, it is not necessary to sequentially transmit vehicle information from the client device to the server. Thereby, the amount of data transmitted between the server and the client device can be reduced. Further, since the client device only determines the condition, it is not necessary for the client device to hold a huge database for interactive processing. In addition, an increase in the processing amount in the client device can be suppressed.
  • FIG. 1 is a diagram showing an overall configuration of a voice interaction system 100 according to the present embodiment.
  • a voice interaction system 100 shown in FIG. 1 includes a client system 110 and a server 120.
  • the client system 110 and the server 120 can communicate with each other via a network.
  • the client system 110 is set, for example, in a vehicle (for example, an automobile) on which the user is boarded.
  • the client system 110 indicates a client apparatus 101 that communicates with the server 120, a microphone 102 that is an example of a device that acquires a user's voice, a speaker 103 that is an example of a device that reproduces synthesized voice, and the state of the vehicle.
  • a vehicle information acquisition unit 104 that acquires vehicle information.
  • the server 120 also includes a speech recognition unit 107 that converts an utterance voice signal indicating a user's utterance voice into an utterance character string, and an interaction processing unit 108 that generates a synthesized character string by performing an audio dialogue process based on the utterance character string. And a speech synthesizer 109 for converting the synthesized character string into a synthesized speech signal.
  • a speech recognition unit 107 that converts an utterance voice signal indicating a user's utterance voice into an utterance character string
  • an interaction processing unit 108 that generates a synthesized character string by performing an audio dialogue process based on the utterance character string.
  • a speech synthesizer 109 for converting the synthesized character string into a synthesized speech signal.
  • FIG. 2 is a block diagram of the client device 101.
  • the client apparatus 101 includes a voice acquisition unit 201 that generates a speech voice signal from a microphone input signal from the microphone 102, and a voice output unit 202 that outputs a speaker output signal based on the synthesized voice signal to the speaker 103. And a communication unit 203 that transmits / receives data to / from the server 120.
  • the trigger information interpretation unit 204 that interprets the trigger information received from the server 120, the client trigger condition holding unit 205 that holds the trigger condition indicated by the trigger information, and the vehicle information acquired by the vehicle information acquisition unit 104 are stored.
  • a vehicle state management unit 206 that performs determination of a trigger condition, and a notification unit 208 that generates a notification signal for notifying the server 120 that the trigger condition is satisfied.
  • FIG. 3 is a block diagram of the dialogue processing unit 108.
  • the dialogue processing unit 108 includes an input character string management unit 301 that receives an utterance character string and a notification signal, a state management unit 302 that performs state transition processing, and a keyword DB (database) that holds keywords.
  • a matching processing unit 304 that performs a matching process on an utterance character string
  • a keyword holding unit 305 that holds a keyword character string
  • a dialogue processing execution unit 306 that executes dialogue processing.
  • a synthetic character string template DB (database) 307 that holds a synthetic character string template
  • a trigger template DB (database) 308 that holds a trigger template
  • an output character string management unit 309 that outputs a synthetic character string and trigger information
  • a server trigger condition holding unit 310 that holds the trigger condition.
  • FIG. 4 is a diagram illustrating the operation of the voice interaction system 100 according to the present embodiment.
  • the microphone 102 of the client apparatus 101 when the user speaks, the microphone 102 of the client apparatus 101 generates a microphone input signal.
  • the voice acquisition unit 201 acquires a microphone input signal and encodes the microphone input signal to generate a speech voice signal that is a digital signal.
  • the communication unit 203 transmits the generated utterance voice signal to the voice recognition unit 107 (S101).
  • the voice recognition unit 107 converts the received utterance voice signal into an utterance character string by voice recognition, and transmits the utterance character string to the dialogue processing unit 108 via the client device 101 (S102).
  • the input character string management unit 301 of the dialogue processing unit 108 receives the utterance character string and holds the received utterance character string.
  • the matching processing unit 304 acquires, from the keyword DB 303, a keyword to be subjected to matching processing in the current state managed by the state management unit 302.
  • the matching processing unit 304 performs matching processing between the acquired utterance character string and the keyword, and holds the matched keyword in the keyword holding unit 305 (S103).
  • the state management unit 302 performs state transition based on the matched keyword.
  • the dialogue processing execution unit 306 generates a synthesized character string and trigger information by performing dialogue processing associated with the state transition.
  • the dialogue processing execution unit 306 (1) database acquisition, change or setting, (2) condition determination, (3) information search, acquisition or processing, and (4) composite character string generation And so on. More specifically, the dialogue processing execution unit 306 acquires the composite character string template from the composite character string template DB 307, sets the character string read from the keyword holding unit 305 to the acquired composite character string template, and will be described later. A synthetic character string is generated by tagging with the ⁇ VOICE> tag.
  • the dialogue processing execution unit 306 acquires a trigger template from the trigger template DB 308, and generates trigger information by setting a character string read from the keyword holding unit 305 to the acquired trigger template.
  • the trigger condition indicated by the trigger information is held in the server trigger condition holding unit 310.
  • the dialog processing execution unit 306 tags the generated trigger information with a ⁇ RULE> tag, which will be described later, and assigns it to the synthesized character string (S104).
  • the output character string management unit 309 transmits the generated synthesized character string to the voice synthesis unit 109 via the client device 101. At this time, the communication unit 203 of the client device 101 acquires trigger information given to the composite character string.
  • the speech synthesizer 109 generates a synthesized speech signal from the received synthesized character string and transmits the generated synthesized speech signal to the client device 101 (S105).
  • the communication unit 203 of the client device 101 receives the synthesized voice signal, and the voice output unit 202 generates a speaker output signal by decoding the synthesized voice signal.
  • the speaker 103 reproduces the synthesized voice based on the generated speaker output signal (S106).
  • the trigger information interpretation unit 204 interprets the trigger information acquired by the communication unit 203 and holds the trigger condition indicated by the trigger information in the client trigger condition holding unit 205.
  • the trigger condition held in the client trigger condition holding unit 205 is the same as the trigger condition held in the server trigger condition holding unit 310 included in the server 120.
  • the vehicle state management unit 206 of the client device 101 holds the vehicle information acquired by the vehicle information acquisition unit 104.
  • the determination unit 207 periodically determines whether the vehicle state indicated by the vehicle information satisfies the trigger condition held in the client trigger condition holding unit 205 (S107).
  • the notification unit 208 generates a notification signal.
  • the communication unit 203 transmits the generated notification signal to the dialogue processing unit 108.
  • the dialogue processing unit 108 deletes the trigger condition that is held in the client trigger condition holding unit 205 and that matches the vehicle state.
  • the input character string management unit 301 of the dialogue processing unit 108 receives the notification signal.
  • the matching processing unit 304 interprets that the notification signal is a message for performing state transition.
  • the state management unit 302 performs state transition based on the notification signal. At this time, the state management unit 302 deletes the trigger condition held in the server trigger condition holding unit 310 and corresponding to the notification signal.
  • the dialogue processing execution unit 306 generates a composite character string based on the state after the transition (S108). Further, the output character string management unit 309 transmits the generated synthesized character string to the voice synthesis unit 109 via the client device 101.
  • the speech synthesizer 109 generates a synthesized speech signal from the synthesized character string and transmits the generated synthesized speech signal to the client device 101 (S109).
  • the communication unit 203 of the client device 101 receives the synthesized voice signal, and the voice output unit 202 generates a speaker output signal by decoding the synthesized voice signal.
  • the speaker 103 reproduces the synthesized speech by outputting the generated speaker output signal (S110). In this way, a push-type dialogue is realized.
  • the utterance from the system side in the push-type dialogue is performed at an arbitrary timing. Therefore, it is possible that the conversation is already in progress. Therefore, a parameter called priority is set for the conversation. Then, the priorities of the current dialogue and the push dialogue are compared, and the dialogue with the lower priority is performed after the dialogue with the higher priority is completed.
  • the state management unit 302 can determine whether the state transition should be performed.
  • the priority of the push dialog is low, this state transition process is stored in the dialog stack, and the high-priority dialog is loaded after the end.
  • the priority is described in advance in the dialogue scenario. The priority is described according to the content of the dialogue, and the priority of the notification signal notified from the client side is compared with the priority of the dialogue currently being performed to control whether the state transition is performed.
  • FIG. 5 is a flowchart showing the flow of processing by the client system 110. Note that the process of FIG. 5 is repeatedly performed at predetermined intervals.
  • the client apparatus 101 determines whether or not the voice acquisition unit 201 has acquired a speech voice (microphone input signal) (S201).
  • the voice acquisition unit 201 acquires a microphone input signal (Yes in S201)
  • the voice acquisition unit 201 generates an utterance voice signal from the microphone input signal
  • the communication unit 203 transmits the utterance voice signal to the server 120 ( S202).
  • the client apparatus 101 determines whether the communication unit 203 has received a voice synthesis signal (S203).
  • the voice output unit 202 When the communication unit 203 receives the voice synthesis signal (Yes in S203), the voice output unit 202 generates a speaker output signal by decoding the synthesized voice signal.
  • the speaker 103 outputs the generated speaker output signal (S204).
  • the client apparatus 101 determines whether the communication unit 203 has received trigger information (S205).
  • the trigger information interpretation unit 204 interprets the trigger information acquired by the communication unit 203 and sets the trigger condition indicated by the trigger information to the client trigger condition holding unit 205. Hold on.
  • the determination unit 207 starts trigger determination as to whether the vehicle state managed by the vehicle state management unit 206 satisfies the trigger condition held in the client trigger condition holding unit 205 (S206).
  • the determination unit 207 performs the trigger determination (S208).
  • the notification unit 208 generates a notification signal.
  • the communication unit 203 transmits the generated notification signal to the dialogue processing unit 108 (S209).
  • steps S201 and S202 the processes in steps S203 and S204, the processes in steps S205 and S206, and the processes in steps S207 to S209 is an example, and may be other than the order shown in FIG. Alternatively, some processes may be performed simultaneously (in parallel) or with overlapping processing times.
  • FIG. 6 is a flowchart showing the flow of processing by the server 120. Note that the process of FIG. 6 is repeatedly performed at predetermined intervals.
  • the server 120 determines whether an utterance voice signal has been received (S301). When an utterance voice signal is received (Yes in S301), the voice recognition unit 107 converts the utterance voice signal into an utterance character string by voice recognition (S302).
  • the input character string management unit 301 of the dialogue processing unit 108 receives the utterance character string and holds the received utterance character string.
  • the matching processing unit 304 acquires, from the keyword DB 303, a keyword to be subjected to matching processing in the current state managed by the state management unit 302.
  • the matching processing unit 304 performs matching processing between the acquired utterance character string and the keyword, and holds the matched keyword in the keyword holding unit 305 (S303).
  • the state management unit 302 performs state transition based on the matched keyword.
  • the dialogue processing execution unit 306 generates a synthesized character string by performing dialogue processing associated with the state transition (S304).
  • the speech synthesizer 109 generates a synthesized speech signal from the synthesized character string and transmits the generated synthesized speech signal to the client device 101 (S305).
  • the dialogue processing execution unit 306 when the determination with the trigger condition occurs in accordance with the state transition in step S304 (Yes in S306), the dialogue processing execution unit 306 generates trigger information indicating the trigger condition, and the output character string management unit 309 The generated trigger information is transmitted to the client apparatus 101 (S307).
  • the server 120 determines whether a notification signal has been received (S308).
  • the matching processing unit 304 interprets that the notification signal is a message for performing state transition.
  • the state management unit 302 performs state transition based on the notification signal.
  • the dialogue processing execution unit 306 generates a synthesized character string by performing dialogue processing based on the state after the transition (S309).
  • the speech synthesizer 109 generates a synthesized speech signal from the synthesized character string and transmits the generated synthesized speech signal to the client device 101 (S310).
  • steps S301 to S307 and the processes of steps S308 to S310 is merely an example, and may be other than the order shown in FIG. 6, or some processes may be performed simultaneously (parallel) or processing time. May be performed in duplicate.
  • the speech recognition unit 107, the dialogue processing unit 108, and the speech synthesis unit 109 included in the server 120 are individual server devices
  • the utterance character string is sent from the voice recognition unit 107 to the dialogue processing unit 108 without going through the client device 101.
  • the synthesized character string is sent from the dialogue processing unit 108 to the voice synthesis unit 109 without passing through the client device 101.
  • At least one of the speech recognition unit 107 and the speech synthesis unit 109 may be included in the client system 110.
  • FIG. 7A to 7C are diagrams for explaining a specific operation example.
  • FIG. 8 is a diagram illustrating an example of keywords held in the keyword DB 303.
  • FIG. 9 is a diagram illustrating an example of state transition in the dialogue processing.
  • FIG. 10 is a diagram illustrating an example of a composite character string template stored in the composite character string template DB 307.
  • FIG. 11 is a diagram illustrating an example of a trigger template stored in the trigger template DB 308.
  • FIG. 12 is a diagram illustrating an example of trigger conditions held in the client trigger condition holding unit 205 and the server trigger condition holding unit 310.
  • the voice recognizing unit 107 recognizes the voice spoken by the user as “I want to go to the AAA tower” (S401).
  • the matching processing unit 304 of the dialogue processing unit 108 performs keyword matching on the uttered character string of the speech recognition result (S402).
  • the keyword DB 303 stores a keyword 410 and a reference keyword group 420.
  • the keyword 410 includes a plurality of tables 401-405. Each of the tables 401 to 405 is a keyword list including one or more keywords (for example, “I want to go to ⁇ location ⁇ ” in the table 401). In each of the tables 401 to 405, a transition source state and a transition destination state are set.
  • Each keyword may include a keyword group (indicated by ⁇ in FIG. 8).
  • the reference keyword group 420 includes lists 406 to 409 each indicating this keyword group.
  • Each of the lists 406 to 409 is a list of a plurality of words included in the keyword group.
  • the transition state is set to “normal state 0100” shown in FIG.
  • the matching processing unit 304 acquires the tables 401, 404, and 405 in which the current state 0100 is set as the transition source from the keywords 410 illustrated in FIG. 8, and the keywords included in the tables 401, 404, and 405 are obtained. From the above, a keyword matching the utterance character string is searched. Here, “I want to go to ⁇ location ⁇ ” included in the table 401 is searched.
  • the matching processing unit 304 holds all the words of the keyword group (in this example, the list 406 of ⁇ location ⁇ ) from the reference keyword group 420 in the keyword holding unit 305.
  • the state management unit 302 performs state transition to the transition destination state 0101 set in the table 401.
  • the state management unit 302 can determine the transition destination as the state 0101 based on the result of matching with the utterance character string “I want to go to the AAA tower”. That is, the state management unit 302 performs a state transition from the normal state 0100 illustrated in FIG. 9 to the destination setting state 0101 for performing the destination setting dialogue.
  • the keyword holding unit 305 holds “AAA tower” as a destination.
  • the dialogue processing unit 108 performs processing such as acquisition of the position of the AAA tower.
  • the dialog processing execution unit 306 generates a composite character string by performing dialog processing based on the transition state.
  • the composite character string template stored in the composite character string template DB 307 includes a state 501 indicating a transition state, a condition 502 for generating a composite character string, and a composite character string template 503. .
  • the dialog processing execution unit 306 sets the template 503 “ ⁇ VOICE> [Destination] as the destination?” In the state 501 of the composite character string template shown in FIG. Is obtained. Next, the dialogue processing execution unit 306 acquires the keyword “AAA tower” set as the destination earlier from the keyword holding unit 305, and applies the acquired character string to [Destination], so that “ ⁇ VOICE> AAA tower”. Is set as the destination? ⁇ / VOICE> "is generated (S403).
  • the dialogue processing execution unit 306 acquires a trigger template from the trigger template DB 308.
  • the trigger template includes a state 511 indicating a transition state, a condition 512 for generating trigger information, and a trigger condition template 513.
  • the dialogue processing execution unit 306 acquires a template 513 that has a state 511 of 0101 and satisfies the condition 512 from the trigger template shown in FIG. In this example, since there is no matching template 513, the composite character string is transmitted to the client apparatus 101 as it is.
  • the client apparatus 101 transmits the contents of the ⁇ VOICE> tag to the voice synthesizer 109 and receives a synthesized voice signal that is a voice synthesis result. Then, the client apparatus 101 outputs “Do you want to set the AAA tower for the purpose?” Indicated by the synthesized voice signal (S404). Further, since the client apparatus 101 has not received the trigger information, the trigger condition is not updated.
  • the matching processing unit 304 of the dialogue processing unit 108 changes the utterance character string “Yes” acquired from the voice recognition unit 107. Keyword matching is performed for this. Specifically, the table 402 is selected, and the state management unit 302 changes the state from the destination setting state 0101 shown in FIG. 9 to the destination determination state 0102 (S406).
  • the dialogue processing execution unit 306 acquires the template “ ⁇ VOICE> [Destination] set as destination ⁇ / VOICE>” shown in FIG. 10 and sets “AAA tower” as [Destination].
  • the composite character string “ ⁇ VOICE> AAA tower is set as the destination ⁇ / VOICE>” is generated (S407).
  • the dialogue processing execution unit 306 acquires a trigger template from the trigger template DB 308 and generates trigger information (S408).
  • the dialogue processing execution unit 306 has a template 513 “ ⁇ RULE> [ID], GPS, [latitude], [longitude], which satisfies the condition 512 from the trigger template shown in FIG. 5,0301, M ⁇ / RULE> ".
  • the destination is a facility that is not within 5 km from the current location and has no parking lot.
  • the template has a content that push dialogue is performed with a priority “medium” for a dialogue [destination approaching state 0301 shown in FIG. 9] for guiding a parking lot when a vehicle enters within a 5 km range of the destination.
  • the dialogue processing execution unit 306 triggers by setting a value (here, “AAA”) unique to [ID], the latitude of the destination in [Latitude], and the longitude of the destination in [Longitude]. Generate information. Based on the contents of the trigger information, the server trigger condition holding unit 310 holds the trigger conditions shown in FIG. As shown in FIG. 12, the trigger condition is an ID 521 that is a unique identifier for identifying the trigger condition, a condition 522 that indicates the condition to be determined, a content 523 that indicates the determination content, and the trigger condition is satisfied. Transition destination 524 indicating the state of the transition destination in the case, and priority 525 indicating the priority of processing after the trigger condition is satisfied.
  • the dialog processing unit 108 adds the tag ⁇ RUELIST> to the trigger information and assigns it to the composite character string, thereby setting the character string “ ⁇ VOICE> AAA tower as the destination ⁇ / VOICE> ⁇ RULELIST>.
  • the client device 101 transmits the contents of the ⁇ VOICE> tag to the speech synthesizer 109 and receives a synthesized speech signal that is a speech synthesis result.
  • the client apparatus 101 then outputs “AAA tower set for the purpose” indicated by the synthesized voice signal (S409).
  • the client apparatus 101 acquires the contents of the ⁇ RULE> tag as trigger information, and holds the trigger condition indicated by the trigger information in the client trigger condition holding unit 205 as shown in FIG. Then, the client apparatus 101 performs determination according to the trigger condition. That is, the determination unit 207 obtains GPS information as vehicle information, obtains the distance between the position of the vehicle indicated by the GPS information and the latitude and longitude of the destination, and determines whether the obtained distance is 5 km or more. For example, it is performed at a cycle of about 10 [s].
  • the notification unit 208 includes a notification signal including the trigger condition ID “AAA”, the transition destination “0301”, and the priority “M”. “ ⁇ STATE> AAA, 0301, M ⁇ / STATE>” is generated.
  • the notification signal includes information for uniquely identifying the trigger condition, information indicating the transition destination of the transition state when the trigger condition is satisfied, and interaction processing performed when the trigger condition is satisfied. And information indicating the priority of the. Then, this notification signal is transmitted to the dialogue processing unit 108. Also, the trigger condition of ID “AAA” stored in the client trigger condition holding unit 205 is deleted (S410).
  • the dialogue processing unit 108 determines the priority between the currently performed dialogue processing and the dialogue processing based on the notification signal based on the priority included in the received notification signal.
  • a specific example of this process will be described with reference to FIG. Note that FIG. 13 shows an example in which a parking lot search process with normal (medium) priority is performed based on the notification signal.
  • the state management unit 302 changes from the normal state 0100 shown in FIG. 9 to the destination approach state 0301 based on the notification signal. Transition state to.
  • the dialogue processing execution unit 306 deletes the trigger condition having the ID 521 of “AAA” from the server trigger condition holding unit 310, and through dialogue processing, “has arrived near the destination. Will you guide the parking lot?”
  • the generated composite character string is generated, and the generated composite character string is transmitted to the client apparatus 101 (S411).
  • the client apparatus 101 outputs a sound “Arrive near the destination, would you like to guide the parking lot?” (S412). At that time, the client device 101 may reproduce some sound indicating that it is near the destination, or may display some character or picture on a display in the vehicle.
  • the priority of the parking lot search process based on the notification signal is higher than that of the interactive process currently being performed, such as during a conversation of a convenience store search process having a higher priority. If it is lower than the priority, the parking lot search process is stored in the dialog stack, and the currently performed dialog process is given priority.
  • FIG. 14A and 14B are diagrams for explaining this operation example.
  • the user speaks “I want to put gasoline when I run for another 10 km” (S501).
  • the matching processing unit 304 of the dialogue processing unit 108 performs keyword matching on the uttered character string of the speech recognition result (S502).
  • the state management unit 302 changes the state from the normal state 0100 shown in FIG. 9 to the conditional instruction state 0200.
  • the dialogue processing execution unit 306 acquires a gas station search state 0400 that is a dialogue state corresponding to “I want to put gasoline”. After that, the dialog processing execution unit 306 confirms that the vehicle can travel 10 km from the state of gasoline in the car, and then the trigger list character string template “ ⁇ RULE> [ID], CAN, dist, [travel distance], [transition dialog”. Status], H ⁇ / RULE> ". Then, the dialogue processing execution unit 306 sets a unique value “BBB” in [ID].
  • the dialogue processing execution unit 306 sets “10” in the “travel distance” and “0400” indicating the gas station search state in the “transition dialogue state”.
  • trigger information “ ⁇ RULE> BBB, CAN, dist, 10, 0400, H ⁇ / RULE>” is generated.
  • the dialogue processing execution unit 306 overwrites the trigger condition with this content.
  • the server trigger condition holding unit 310 is updated as shown in FIG.
  • the dialogue processing execution unit 306 generates a composite character string “ ⁇ VOICE> is accepted. We will notify you when you run another 10 km. ⁇ / VOICE>”, and the generated trigger information and composite character string are displayed on the client device 101. (S503).
  • the client device 101 receives the composite character string and outputs “I understand. I will let you know when I run for another 10 km” (S504).
  • the client device 101 receives the trigger information, and overwrites the trigger condition stored in the client trigger condition holding unit 205 with the trigger condition indicated by the received trigger information.
  • the trigger condition shown in FIG. 15 is held in the client trigger condition holding unit 205, and the trigger condition having the same ID “BBB” as the ID of the trigger condition is overwritten on the trigger condition indicated by the received trigger information. Is done.
  • the client trigger condition holding unit 205 is updated as shown in FIG.
  • the client device 101 compares the target travel distance with the current travel distance according to the updated trigger condition, and periodically determines whether the travel distance has reached the target travel distance.
  • the notification unit 208 When the travel distance has reached the target travel distance, the notification unit 208 generates a notification signal “ ⁇ STATE> BBB, 0400, H ⁇ / STATE>” and transmits it to the dialogue processing unit 108 (S505).
  • the dialogue processing unit 108 receives the notification signal, and the state management unit 302 changes the state from the normal state 0100 shown in FIG. 9 to the gas station search state 0400. Based on the state after the transition, the dialogue processing execution unit 306 generates a composite character string indicating “traveled 10 km. Do you want to search for a gas station?” (S506). The generated composite character string is transmitted to the client device 101, and the client device 101 emits a sound “Drived 10 km. Do you want to search for a gas station?” (S507).
  • both the server trigger condition holding unit 310 of the dialogue processing unit 108 and the client trigger condition holding unit 205 of the client apparatus 101 have the ID 521, the condition 522, the content 523, the transition destination 524, and the priority 525.
  • the example of holding each has been described, part or all of the information may be held only in one holding unit.
  • the trigger information and the notification signal include the ID, the information indicating the transition destination, and the information indicating the priority. However, at least one of these may not be included.
  • the transition destination 524 and the priority 525 may be managed only by the dialogue processing unit 108. In this case, this information is not included in the trigger information and the notification signal.
  • the dialogue processing unit 108 determines the transition destination and the priority based on the ID included in the notification signal and the managed information.
  • the transition destination 524 and the priority 525 may be managed only by the client device 101.
  • this information is included in the trigger information and the notification signal.
  • the dialogue processing unit 108 determines the transition destination or priority based on the transition destination or priority included in the notification signal.
  • the dialog processing unit 108 does not need to hold the trigger condition.
  • the dialogue processing unit 108 determines the transition destination or priority based on the transition destination or priority included in the notification signal. Further, when the transition destination is included in the notification signal, the dialogue processing unit 108 can perform processing based only on the transition destination, and thus the notification signal may not include the ID.
  • the notification signal may indicate that the trigger condition indicated by the trigger information is satisfied.
  • the content of the dialogue is immediately used as a trigger condition.
  • keywords of past dialogue are accumulated in the keyword holding unit 305, and the dialogue processing unit 108 generates a trigger condition based on the accumulated information. May be.
  • the dialog processing unit 108 when a dialog indicating that the area is the first visited area is performed, the dialog processing unit 108 generates a first trigger condition for determining that the user has arrived at the area. Further, when the first trigger condition is satisfied, a second trigger condition for determining that the user has approached the spot information of the sightseeing spot or the spot advertisement of the store is generated. When the second trigger condition is satisfied, the dialog processing unit 108 performs a dialog regarding sightseeing guide or advertisement. As described above, the dialogue processing unit 108 may generate a two-stage trigger condition.
  • the dialog processing unit 108 is a trigger for determining that an abnormality has occurred in the car. Generate a condition. When this condition is satisfied, the dialogue processing unit 108 gives information indicating what kind of abnormality has occurred and how to deal with it by dialogue. Thereby, the help function only for beginners is realizable using the result of a plurality of dialogs.
  • the dialogue processing unit 108 when an Italian restaurant is frequently searched at noon, the dialogue processing unit 108 generates a trigger condition for determining whether it is a lunch hour, and if the condition is satisfied, the store of the Italian restaurant Is guided in a dialogue. Thereby, a recommendation function can also be realized.
  • the trigger condition is a condition such as a travel distance of the vehicle.
  • the trigger condition may be a condition of the vehicle state.
  • the trigger condition may be a parameter condition that can be acquired by a client device installed in a vehicle.
  • the trigger conditions are vehicle information (accelerator depression amount, brake depression amount, steering depression angle, shift position, blinker state, wiper state, light state, gasoline state, etc. acquired from CAN (Controller Area Network).
  • Conditions such as amount, travel distance, vehicle speed, vehicle body acceleration, vehicle body angular velocity, inter-vehicle distance, proximity sensor, water temperature, oil amount, various warnings, window open / closed degree, or air conditioner setting).
  • the trigger condition may be a condition of the state of the user who is in the car.
  • the trigger condition is information indicating a user's state obtained by sensors arranged around the driver (line of sight, face orientation, voice, number of passengers, personal recognition information, body weight, body temperature, pulse, blood pressure, sweating, brain wave , Arousal level or concentration level).
  • the trigger condition may be a condition such as information (GPS information, in-vehicle temperature, outside air temperature, humidity, time, etc.) obtained by other in-vehicle sensors.
  • information GPS information, in-vehicle temperature, outside air temperature, humidity, time, etc.
  • the trigger condition is a condition of a vehicle or a user state acquired by a sensor included in the vehicle or the client device.
  • the trigger condition is a condition of a vehicle or a user that changes sequentially.
  • the client device 101 may be mounted on a moving body (for example, a train, an airplane, a bicycle, or the like).
  • the client device 101 may be possessed or carried by a user who rides on these moving objects. That is, the trigger condition may be a condition of the state of the moving body or a condition of the state of the user who is on the moving body.
  • the trigger condition may be an AND condition, an OR condition, or a combination thereof.
  • ⁇ RULELIST> ⁇ AND> ⁇ RULE> [ID], GPS, [latitude], [longitude], 5,0301, H ⁇ / RULE> ⁇ RULE> [ID], TIME, [time], 0301, H ⁇ / RULE> ⁇ / AND> ⁇ / RULELIST> may define an AND condition of two rules. As a result, push utterance is performed when both the position condition and the time condition are satisfied.
  • ⁇ RULELIST> ⁇ OR> ⁇ RULE> [ID], GPS, [latitude 1], [longitude 1], 5,0301, H ⁇ / RULE> ⁇ RULE> [ID], GPS, [latitude 2],
  • the OR condition of the two rules may be defined by [longitude 2], 5, 0301, H ⁇ / RULE> ⁇ / AND> ⁇ / RULELIST>.
  • the dialogue processing unit 108 detects the case where the vehicle is traveling near the highway and running at a high speed steady state. It is also possible to guide the auto cruise control function by voice.
  • the dialogue processing unit 108 detects either a decrease in the user's arousal level or a decrease in the concentration level, and recommends the user to take a break. It is also possible.
  • the dialogue processing unit 108 monitors (1) conversation between people in the vehicle, when the conversation is interrupted, (2) monitors driving load, and when the driving load decreases, (3) user's brain wave
  • the user's mental state may be detected from a heartbeat or the like, and push-type speech may be performed when the user's mental state is at rest.
  • each function is described separately for the server 120 and the client system 110, but the client system 110 may have a part of the functions of the server 120.
  • the speech recognition unit 107 the dialogue processing unit 108, and the speech synthesis unit 109 are not directly connected.
  • the output signal of the speech recognition unit 107 is directly input to the dialogue processing unit 108.
  • a synthesized character string may be directly input to the speech synthesis unit 109 and trigger information may be transmitted to the client device 101.
  • the trigger condition is erased when the condition is satisfied.
  • the client trigger condition holding unit of the client device 101 is satisfied.
  • the trigger condition may be deleted from 205.
  • the client apparatus 101 transmits an erasure command to the server 120, and the server 120 that has received the erasure command deletes the trigger condition of the server trigger condition holding unit 310.
  • the voice dialogue system 100 is a voice dialogue system that interacts with a user based on voice from a user who is on a moving body, and obtains voice from the user,
  • a client device 101 that is an information terminal device that provides voice information to a user and a server 120 are included.
  • the server 120 transmits to the client apparatus 101 trigger information indicating the condition of the mobile object or the user, and requests that the notification signal be transmitted to the server 120 when the condition is satisfied (S307 in FIG. 6).
  • the server 120 receives the notification signal (Yes in S308), the server 120 generates speech information (synthetic character string or speech synthesis signal) by performing the voice interaction process when the above condition is satisfied (S309).
  • the server 120 transmits audio information to the client device 101 (S310).
  • the client apparatus 101 receives the trigger information from the server 120 (S205 in FIG. 5), and determines whether or not the condition indicated by the trigger information is satisfied (S206 and S208). When the condition indicated by the trigger information is satisfied (Yes in S208), the client apparatus 101 transmits a notification signal to the server 120 (S209).
  • the client apparatus 101 receives voice information (synthetic character string or voice synthesis signal) from the server 120 (S203), and provides the received voice information to the user (S204).
  • the condition indicated by the trigger information includes a condition of the moving object or the user acquired by a sensor included in the moving object or the client device 101.
  • this condition includes a condition of the position of the moving object or a condition of the moving distance of the moving object.
  • the client device 101 can be realized with a very simple and inexpensive structure by determining the trigger condition generated by the server 120 by the client device 101.
  • the server 120 can maintain the conditions for the push-type conversation, maintenance and expandability can be sufficiently secured.
  • a conditional instruction such as “when it comes to” is received from the user, which is difficult to realize with the conventional system, it can be realized with the same mechanism.
  • the notification signal includes information indicating the priority of the voice conversation processing based on the notification signal. Based on the priority, the server 120 executes a voice conversation process with a higher priority among the voice conversation process currently being executed and the voice conversation process based on the notification signal.
  • the server can immediately grasp the processing priority using the priority included in the notification signal.
  • the notification signal includes information indicating the transition destination of the state transition in the voice dialogue processing based on the notification signal.
  • the server 120 changes the state to the transition destination indicated by the information, and generates voice information based on the state after the transition.
  • the server can immediately execute the dialogue process using the information of the transition destination included in the notification signal.
  • the voice interaction system according to the embodiment of the present disclosure has been described.
  • the present disclosure is not limited to this embodiment.
  • the present disclosure is not limited to the voice interaction system, but may be realized as a server or a client device (information terminal device) included in the voice interaction system, or an information providing method in the voice interaction system, the server, or the client device. It may be realized as.
  • a part or all of the processing units included in the voice interaction system according to the above embodiment is typically realized as an LSI that is an integrated circuit. These may be individually made into one chip, or may be made into one chip so as to include a part or all of them.
  • circuits are not limited to LSI, and may be realized by a dedicated circuit or a general-purpose processor.
  • An FPGA Field Programmable Gate Array
  • reconfigurable processor that can reconfigure the connection and setting of circuit cells inside the LSI may be used.
  • each component may be configured by dedicated hardware or may be realized by executing a software program suitable for each component.
  • Each component may be realized by a program execution unit such as a CPU or a processor reading and executing a software program recorded on a recording medium such as a hard disk or a semiconductor memory.
  • the present disclosure may be the above-described program, or a non-transitory computer-readable recording medium on which the above-described program is recorded.
  • the program can be distributed via a transmission medium such as the Internet.
  • the dialogue contents, numbers, and the like used above are all exemplified for specifically explaining the present disclosure, and the present disclosure is not limited to the exemplified dialogue contents, numbers, and the like.
  • division of functional blocks in the block diagram is an example, and a plurality of functional blocks can be realized as one functional block, a single functional block can be divided into a plurality of functions, or some functions can be transferred to other functional blocks. May be.
  • functions of a plurality of functional blocks having similar functions may be processed in parallel or time-division by a single hardware or software.
  • the information providing method in the above spoken dialogue system is for illustration in order to specifically explain the present disclosure, and the information providing method according to the present disclosure does not necessarily include all the above steps. Absent.
  • the order in which the above steps are executed is for illustration in order to specifically describe the present disclosure, and may be in an order other than the above. Further, a part of the above steps may be executed simultaneously with other steps (in parallel) or with overlapping processing times.
  • the voice interaction system according to one or more aspects has been described based on the embodiment.
  • the present disclosure is not limited to this embodiment. Unless it deviates from the gist of the present disclosure, various modifications conceived by those skilled in the art have been made in this embodiment, and forms constructed by combining components in different embodiments are also within the scope of one or more aspects. May be included.
  • the present disclosure can be applied to a voice interaction system, and is useful, for example, for a system that performs a voice interaction with a user in a car.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Navigation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Telephone Function (AREA)
  • Telephonic Communication Services (AREA)

Abstract

Provided is an information dissemination method for a server included in a voice interaction system that interacts with a user riding in a moving body, on the basis of voice interaction with the user. The method includes: a trigger information transmission step for transmitting trigger information indicating a condition as to the status of the moving body or of the user to a client device and requesting the client device to transmit a notification signal when the condition is met, the client device being included in the voice interaction system, and the client device acquiring the voice of the user and disseminating voice information to the user; a generation step for generating voice information by performing a voice interaction process, when the condition has been met; and a guide information transmission step for transmitting voice information to the client device.

Description

情報提供方法、サーバ、情報端末装置、システム及び音声対話システムInformation providing method, server, information terminal device, system, and voice dialogue system
 本開示は、移動体に搭乗しているユーザからの音声に基づき、ユーザと対話する音声対話システムにおける情報提供方法に関する。 The present disclosure relates to an information providing method in a voice dialogue system that interacts with a user based on voice from a user who is on a moving body.
 音声認識、対話処理及び音声合成を組み合わせることで、ユーザが自然言語によって装置に命令を与えることができる音声対話装置が広く知られている。近年では、対話技術として一往復の応答にとどまらず、一つの話題に関して繰り返し応答を行う中で得た情報を保持して活用したり、複数の話題に対応できるなど、より自然な会話を実現できる対話システムが提案されている(例えば特許文献1の参照)。これにより、ある話題の対話中に、別の話題の対話を割り込ませるような複雑な対話を実現することも可能である。 A speech dialogue apparatus that allows a user to give a command to the apparatus in a natural language by combining speech recognition, dialogue processing, and speech synthesis is widely known. In recent years, as a dialogue technology, it is not limited to a single round-trip response, but it is possible to realize more natural conversations by holding and using information obtained by repeatedly responding to a single topic, and supporting multiple topics. An interactive system has been proposed (see, for example, Patent Document 1). As a result, it is possible to realize a complex conversation that interrupts a conversation of another topic during a conversation of a certain topic.
 また、対話処理をサーバで行い、クライアント装置がサーバとの通信する音声対話システム(サーバ・クライアント型の音声対話システム)では、上記のような複雑な対話を構築する場合に重要である対話シナリオ(対話の応答内容を記述する部分)のメンテナンスも容易に行え、かつ拡張性も高くなると考えられている。 Further, in a voice dialogue system (server / client type voice dialogue system) in which dialogue processing is performed by a server and a client device communicates with the server, a dialogue scenario (important when constructing a complicated dialogue as described above ( It is thought that maintenance of the part that describes the response contents of the dialogue can be easily performed and expandability is also improved.
 また、特許文献2のようにプッシュ型の発話を行うシステムが提案されている。 Also, a system for performing a push-type utterance as in Patent Document 2 has been proposed.
特開2007-52043号公報JP 2007-52043 A 特開平11-37766号公報JP 11-37766 A
 このような対話処理の一部をサーバで行う音声対話システムでは、クライアント装置が保持するデータベースの量を削減できるとともに、サーバとクライアント装置との間で伝送されるデータ量を削減できることが望まれている。 In a voice dialogue system in which a part of such dialogue processing is performed by a server, it is desired that the amount of database held by the client device can be reduced and the amount of data transmitted between the server and the client device can be reduced. Yes.
 本開示は、クライアント装置が保持するデータベースの量を削減できるとともに、サーバとクライアント装置との間で伝送されるデータ量を削減できる情報提供方法を提供することを目的とする。 This disclosure is intended to provide an information providing method capable of reducing the amount of database held by the client device and reducing the amount of data transmitted between the server and the client device.
 上記目的を達成するために、本開示の一態様に係る情報提供方法は、移動体に搭乗しているユーザからの音声に基づき、ユーザと対話する音声対話システムに含まれるサーバにおける情報提供方法である。情報提供方法は、音声対話システムに含まれ、音声を取得し、ユーザに音声情報を提供する情報端末装置に、移動体又はユーザの状態の条件を示すトリガ情報を送信し、当該条件を満たすときに通知信号を送信することを依頼するトリガ情報送信ステップと、通知信号を受信した場合、条件が満たされた場合の音声対話用処理を行うことで音声情報を生成する生成ステップと、音声情報を情報端末装置に送信する音声情報送信ステップとを含む。 In order to achieve the above object, an information providing method according to an aspect of the present disclosure is an information providing method in a server included in a voice dialogue system that interacts with a user based on voice from a user who is on a moving body. is there. An information providing method is included in a voice interaction system, and when trigger information indicating a condition of a moving object or a user is transmitted to an information terminal device that acquires voice and provides voice information to a user, and satisfies the condition A trigger information transmission step for requesting transmission of a notification signal, a generation step for generating voice information by performing a process for voice interaction when a condition is satisfied when the notification signal is received, and a voice information A voice information transmission step to be transmitted to the information terminal device.
 また、本開示の一態様に係る情報提供方法は、移動体に搭乗しているユーザからの音声に基づき、ユーザと対話する音声対話システムに含まれ、音声を取得する情報端末装置における情報提供方法である。情報提供方法は、音声対話システムに含まれるサーバから、移動体又はユーザの状態の条件を示すトリガ情報を受信するトリガ情報受信ステップと、条件が満たされたか否かを判定する判定ステップと、条件が満たされた場合に、サーバに通知信号を送信する送信ステップと、サーバから、条件が満たされた場合の音声対話用処理により生成された音声情報を受信する音声情報受信ステップと、音声情報をユーザに提供する提供ステップとを含んでいてもよい。 An information providing method according to an aspect of the present disclosure is included in a voice dialogue system that interacts with a user based on voice from a user who is on a moving body, and the information provision method in an information terminal device that acquires voice It is. The information providing method includes a trigger information receiving step for receiving trigger information indicating a condition of a moving object or a user from a server included in the voice interactive system, a determination step for determining whether the condition is satisfied, and a condition When the condition is satisfied, a transmission step of transmitting a notification signal to the server, a voice information reception step of receiving the voice information generated by the voice interaction processing when the condition is satisfied from the server, and the voice information A providing step to provide to the user.
 また、本開示の一態様に係る情報提供方法は、移動体に搭乗しているユーザからの音声に基づき、ユーザと対話する音声対話システムにおける情報提供方法である。音声対話システムは、音声を取得する情報端末装置と、サーバとを含み、情報提供方法は、サーバが、情報端末装置に、移動体又はユーザの状態の条件を示すトリガ情報を送信するトリガ情報送信ステップと、情報端末装置が、トリガ情報を受信した場合、条件が満たされたか否かを判定する判定ステップと、情報端末装置が、条件が満たされた場合に、サーバに通知信号を送信する送信ステップと、サーバが、通知信号を受信した場合、条件が満たされた場合の音声対話用処理を行うことで音声情報を生成する生成ステップと、サーバが、音声情報を情報端末装置に送信する音声情報送信ステップと、情報端末装置が、音声情報を受信し、音声情報をユーザに提供する提供ステップとを含んでいてもよい。 Also, the information providing method according to one aspect of the present disclosure is an information providing method in a voice dialogue system that interacts with a user based on voice from a user who is on a moving body. The voice interactive system includes an information terminal device that acquires voice and a server, and the information providing method includes trigger information transmission in which the server transmits trigger information indicating a condition of a moving object or a user to the information terminal device. A step for determining whether or not the condition is satisfied when the information terminal device receives the trigger information, and a transmission for transmitting a notification signal to the server when the condition is satisfied A step of generating voice information by performing a voice interaction process when the condition is satisfied when the notification signal is received by the server, and a voice by which the server transmits the voice information to the information terminal device The information transmitting step may include a providing step in which the information terminal device receives the voice information and provides the voice information to the user.
 なお、これらの全般的または具体的な態様は、システム、方法、集積回路、コンピュータプログラムまたはコンピュータ読み取り可能なCD-ROMなどの記録媒体で実現されてもよく、システム、方法、集積回路、コンピュータプログラム及び記録媒体の任意な組み合わせで実現されてもよい。 These general or specific aspects may be realized by a system, a method, an integrated circuit, a computer program, or a recording medium such as a computer-readable CD-ROM. The system, method, integrated circuit, computer program Also, any combination of recording media may be realized.
 本開示は、クライアント装置が保持するデータベースの量を削減できるとともに、サーバとクライアント装置との間で伝送されるデータ量を削減できる情報提供方法を提供する。 This disclosure provides an information providing method capable of reducing the amount of database held by the client device and reducing the amount of data transmitted between the server and the client device.
実施の形態における音声対話システムの全体構成を示す図The figure which shows the whole structure of the voice dialogue system in embodiment 実施の形態におけるクライアント装置の構成を示すブロック図Block diagram showing a configuration of a client device according to an embodiment 実施の形態における対話処理部の構成を示すブロック図The block diagram which shows the structure of the dialog processing part in embodiment 実施の形態における音声対話システムの動作を示す図The figure which shows operation | movement of the speech dialogue system in embodiment 実施の形態におけるクライアントシステムの動作を示すフローチャートFlowchart showing the operation of the client system in the embodiment 実施の形態におけるサーバの動作を示すフローチャートThe flowchart which shows operation | movement of the server in embodiment 実施の形態における音声対話システムの動作の一例を示す図The figure which shows an example of operation | movement of the speech dialogue system in embodiment 実施の形態における音声対話システムの動作の一例を示す図The figure which shows an example of operation | movement of the speech dialogue system in embodiment 実施の形態における音声対話システムの動作の一例を示す図The figure which shows an example of operation | movement of the speech dialogue system in embodiment 実施の形態におけるキーワードDBに保持される情報の一例を示す図The figure which shows an example of the information hold | maintained in keyword DB in embodiment 実施の形態における対話処理における状態遷移の一例を示す図The figure which shows an example of the state transition in the dialogue processing in embodiment 実施の形態における合成文字列テンプレートの一例を示す図The figure which shows an example of the synthetic | combination character string template in embodiment 実施の形態におけるトリガテンプレートの一例を示す図The figure which shows an example of the trigger template in embodiment 実施の形態におけるトリガ条件の一例を示す図The figure which shows an example of the trigger condition in embodiment 実施の形態における対話スタックの例を示す図The figure which shows the example of the dialog stack in embodiment 実施の形態における音声対話システムの動作の一例を示す図The figure which shows an example of operation | movement of the speech dialogue system in embodiment 実施の形態における音声対話システムの動作の一例を示す図The figure which shows an example of operation | movement of the speech dialogue system in embodiment 実施の形態におけるトリガ条件の一例を示す図The figure which shows an example of the trigger condition in embodiment 実施の形態におけるトリガ条件の一例を示す図The figure which shows an example of the trigger condition in embodiment
 (本開示の基礎となった知見)
 本発明者は、「背景技術」の欄において記載した、音声対話システムに関し、以下の問題が生じることを見出した。
(Knowledge that became the basis of this disclosure)
The present inventor has found that the following problems occur with respect to the voice interaction system described in the “Background Art” section.
 プッシュ型の発話を行うシステムでは、車両の状態(例えば、車両の位置又は移動速度等)等の逐次変化する情報に基づき、音声対話システムからの発話が行なわれる。よって、サーバで全ての対話処理を行う場合には、クライアント装置で検出されたこれらの情報を、逐次サーバに送信する必要がある。これにより、サーバとクライアント装置との間で伝送されるデータ量が増加するという問題が生じる。 In a system that performs a push-type utterance, an utterance is made from a voice dialogue system based on information that changes sequentially such as the state of the vehicle (for example, the position or moving speed of the vehicle). Therefore, when all interactive processes are performed by the server, it is necessary to sequentially transmit these pieces of information detected by the client device to the server. This causes a problem that the amount of data transmitted between the server and the client device increases.
 さらに、車両の位置情報等をサーバに送信することは、個人データ保護等の観点からも好ましくない。 Furthermore, it is not preferable from the viewpoint of personal data protection or the like to transmit vehicle position information or the like to the server.
 一方で、このようなプッシュ型の対話を実現するための処理を、クライアント装置において行う場合には、予め決められた振る舞いをデータベースとしてクライアント装置に組み込んでおかなければならない。具体的には、このようなトリガは無数にあり、その条件の組み合わせを判定するためには、膨大なデータベースが必要になる。また、クライアント装置の処理量も増加するという問題が生じる。特に、サーバ・クライアント型の音声対話システムに組み込んだ場合、クライアント装置に膨大なデータベースを持たなければならず、サーバ・クライアント型にしたメリット(メンテナンス性及び拡張性)が活かされないという問題が生じる。 On the other hand, when processing for realizing such a push-type dialogue is performed in the client device, a predetermined behavior must be incorporated into the client device as a database. Specifically, there are an infinite number of such triggers, and an enormous database is required to determine combinations of conditions. There is also a problem that the processing amount of the client device increases. In particular, when it is incorporated into a server / client type voice interactive system, the client device must have a huge database, and there is a problem that the advantages (maintenability and expandability) of the server / client type cannot be utilized.
 これらの問題に対して、本実施の形態は、サーバ・クライアント方式の音声対話装置において、クライアントに膨大なデータベースを持つことなく、サーバからの必要最小限の情報提供のみでプッシュ型の対話を行うことができ、かつユーザからの発話に応じた柔軟な振る舞いを行うことのできる音声対話システムを実現する。 In order to address these problems, this embodiment performs push-type conversations only by providing the minimum necessary information from the server without having a huge database in the client in the server-client type voice interaction apparatus. And a spoken dialogue system capable of performing flexible behavior according to the utterance from the user.
 本開示の一態様に係る情報提供方法は、移動体に搭乗しているユーザからの音声に基づき、ユーザと対話する音声対話システムに含まれるサーバにおける情報提供方法である。情報提供方法は、音声対話システムに含まれ、音声を取得し、ユーザに音声情報を提供する情報端末装置に、移動体又はユーザの状態の条件を示すトリガ情報を送信し、当該条件を満たすときに通知信号を送信することを依頼するトリガ情報送信ステップと、通知信号を受信した場合、条件が満たされた場合の音声対話用処理を行うことで音声情報を生成する生成ステップと、音声情報を情報端末装置に送信する音声情報送信ステップとを含む。 The information providing method according to an aspect of the present disclosure is an information providing method in a server included in a voice interaction system that interacts with a user based on a voice from a user who is on a moving body. An information providing method is included in a voice interaction system, and when trigger information indicating a condition of a moving object or a user is transmitted to an information terminal device that acquires voice and provides voice information to a user, and satisfies the condition A trigger information transmission step for requesting transmission of a notification signal, a generation step for generating voice information by performing a process for voice interaction when a condition is satisfied when the notification signal is received, and a voice information A voice information transmission step to be transmitted to the information terminal device.
 これによれば、サーバとクライアント装置(情報端末装置)との間で伝送されるデータ量を削減できる。また、クライアント装置では、条件の判定のみを行なうので、クライアント装置が対話処理のための膨大なデータベースを保持する必要がない。また、クライアント装置における処理量の増加を抑制できる。 According to this, it is possible to reduce the amount of data transmitted between the server and the client device (information terminal device). Further, since the client device only determines the condition, it is not necessary for the client device to hold a huge database for interactive processing. In addition, an increase in the processing amount in the client device can be suppressed.
 例えば、条件は、移動体又は情報端末装置が備えるセンサにより取得される移動体又はユーザの状態の条件を含んでもよい。 For example, the condition may include a condition of a moving body or a user acquired by a sensor included in the moving body or the information terminal device.
 例えば、条件は、移動体の位置の条件を含んでもよい。 For example, the condition may include a condition of the position of the moving object.
 例えば、条件は、移動体の移動距離の条件を含んでもよい。 For example, the condition may include a condition of a moving distance of the moving body.
 例えば、通知信号は、当該通知信号に基づく音声対話用処理の優先度を示す情報を含み、生成ステップでは、優先度に基づき、現在実行中の音声対話用処理と、通知信号に基づく音声対話用処理とのうち、優先度の高い音声対話用処理を実行してもよい。 For example, the notification signal includes information indicating the priority of the voice conversation processing based on the notification signal. In the generation step, the voice conversation processing currently being executed and the voice conversation processing based on the notification signal are performed based on the priority. Among the processes, a voice interaction process with a high priority may be executed.
 これによれば、優先度に応じた適切な対話処理を実現できる。また、サーバは、通知信号に含まれる優先度を用いて直ちに処理の優先度を把握できる。 According to this, it is possible to realize appropriate dialogue processing according to priority. Further, the server can immediately grasp the processing priority using the priority included in the notification signal.
 例えば、通知信号は、当該通知信号に基づく音声対話用処理における状態遷移の遷移先を示す情報を含み、生成ステップでは、遷移先に状態を遷移させ、遷移後の状態に基づき音声情報を生成してもよい。 For example, the notification signal includes information indicating the transition destination of the state transition in the voice interaction processing based on the notification signal. In the generation step, the state is transitioned to the transition destination, and voice information is generated based on the state after the transition. May be.
 これによれば、サーバは、通知信号に含まれる遷移先の情報を用いて直ちに対話処理を実行できる。 According to this, the server can immediately execute the dialogue process using the information of the transition destination included in the notification signal.
 また、本開示の一態様に係る情報提供方法は、移動体に搭乗しているユーザからの音声に基づき、ユーザと対話する音声対話システムに含まれ、音声を取得する情報端末装置における情報提供方法である。情報提供方法は、音声対話システムに含まれるサーバから、移動体又はユーザの状態の条件を示すトリガ情報を受信するトリガ情報受信ステップと、条件が満たされたか否かを判定する判定ステップと、条件が満たされた場合に、サーバに通知信号を送信する送信ステップと、サーバから、条件が満たされた場合の音声対話用処理により生成された音声情報を受信する音声情報受信ステップと、音声情報をユーザに提供する提供ステップとを含む。 An information providing method according to an aspect of the present disclosure is included in a voice dialogue system that interacts with a user based on voice from a user who is on a moving body, and the information provision method in an information terminal device that acquires voice It is. The information providing method includes a trigger information receiving step for receiving trigger information indicating a condition of a moving object or a user from a server included in the voice interactive system, a determination step for determining whether the condition is satisfied, and a condition When the condition is satisfied, a transmission step of transmitting a notification signal to the server, a voice information reception step of receiving the voice information generated by the voice interaction processing when the condition is satisfied from the server, and the voice information Providing steps to the user.
 これによれば、サーバとクライアント装置(情報端末装置)との間で伝送されるデータ量を削減できる。また、クライアント装置では、条件の判定のみを行なうので、クライアント装置が対話処理のための膨大なデータベースを保持する必要がない。また、クライアント装置における処理量の増加を抑制できる。 According to this, it is possible to reduce the amount of data transmitted between the server and the client device (information terminal device). Further, since the client device only determines the condition, it is not necessary for the client device to hold a huge database for interactive processing. In addition, an increase in the processing amount in the client device can be suppressed.
 また、本開示の一態様に係る情報提供方法は、移動体に搭乗しているユーザからの音声に基づき、ユーザと対話する音声対話システムにおける情報提供方法である。音声対話システムは、音声を取得する情報端末装置と、サーバとを含み、情報提供方法は、サーバが、情報端末装置に、移動体又はユーザの状態の条件を示すトリガ情報を送信するトリガ情報送信ステップと、情報端末装置が、トリガ情報を受信した場合、条件が満たされたか否かを判定する判定ステップと、情報端末装置が、条件が満たされた場合に、サーバに通知信号を送信する送信ステップと、サーバが、通知信号を受信した場合、条件が満たされた場合の音声対話用処理を行うことで音声情報を生成する生成ステップと、サーバが、音声情報を情報端末装置に送信する音声情報送信ステップと、情報端末装置が、音声情報を受信し、音声情報をユーザに提供する提供ステップとを含む。 Also, the information providing method according to one aspect of the present disclosure is an information providing method in a voice dialogue system that interacts with a user based on voice from a user who is on a moving body. The voice interactive system includes an information terminal device that acquires voice and a server, and the information providing method includes trigger information transmission in which the server transmits trigger information indicating a condition of a moving object or a user to the information terminal device. A step for determining whether or not the condition is satisfied when the information terminal device receives the trigger information, and a transmission for transmitting a notification signal to the server when the condition is satisfied A step of generating voice information by performing a voice interaction process when the condition is satisfied when the notification signal is received by the server, and a voice by which the server transmits the voice information to the information terminal device An information transmission step, and an information terminal device receiving the voice information and providing the user with the voice information.
 これによれば、サーバとクライアント装置(情報端末装置)との間で伝送されるデータ量を削減できる。また、クライアント装置では、条件の判定のみを行なうので、クライアント装置が対話処理のための膨大なデータベースを保持する必要がない。また、クライアント装置における処理量の増加を抑制できる。 According to this, it is possible to reduce the amount of data transmitted between the server and the client device (information terminal device). Further, since the client device only determines the condition, it is not necessary for the client device to hold a huge database for interactive processing. In addition, an increase in the processing amount in the client device can be suppressed.
 また、本開示の一態様に係るサーバは、移動体に搭乗しているユーザからの音声に基づき、ユーザと対話する音声対話システムに含まれるサーバであって、情報端末装置と、トリガ情報送信部と、生成部と、音声情報送信部とを備える。 Further, a server according to an aspect of the present disclosure is a server included in a voice interaction system that interacts with a user based on a voice from a user who is on a moving body, the information terminal device, and a trigger information transmission unit And a generation unit and a voice information transmission unit.
 情報端末装置は、音声対話システムに含まれ、音声を取得し、ユーザに音声情報を提供する。トリガ情報送信部は、情報端末装置に、移動体又はユーザの状態の条件を示すトリガ情報を送信し、当該条件を満たすときに通知信号を送信することを依頼する。生成部は、通知信号を受信した場合、条件が満たされた場合の音声対話用処理を行うことで音声情報を生成する。音声情報送信部は、音声情報を情報端末装置に送信する。 The information terminal device is included in the voice interaction system, acquires voice, and provides voice information to the user. A trigger information transmission part transmits the trigger information which shows the conditions of a mobile body or a user's state to an information terminal device, and requests | requires transmitting a notification signal, when the said conditions are satisfy | filled. When the generation unit receives the notification signal, the generation unit generates voice information by performing a voice interaction process when the condition is satisfied. The voice information transmitting unit transmits voice information to the information terminal device.
 また、本開示の一態様に係る情報端末装置は、移動体に搭乗しているユーザからの音声に基づき、ユーザと対話する音声対話システムに含まれ、トリガ情報受信部と、判定部と、送信部と、音声情報受信部と、提供部とを備える。 Further, an information terminal device according to an aspect of the present disclosure is included in a voice interaction system that interacts with a user based on a voice from a user boarding a moving body, and includes a trigger information reception unit, a determination unit, and a transmission Unit, a voice information receiving unit, and a providing unit.
 情報端末装置は音声を取得する。トリガ情報受信部は、音声対話システムに含まれるサーバから、移動体又はユーザの状態の条件を示すトリガ情報を受信する。判定部は、条件が満たされたか否かを判定する。送信部は、条件が満たされた場合に、サーバに通知信号を送信する。音声情報受信部は、サーバから、条件が満たされた場合の音声対話用処理により生成された音声情報を受信する。提供部は、音声情報をユーザに提供する。 Information terminal device acquires voice. The trigger information receiving unit receives trigger information indicating a condition of a moving object or a user from a server included in the voice interactive system. The determination unit determines whether or not the condition is satisfied. The transmission unit transmits a notification signal to the server when the condition is satisfied. The voice information receiving unit receives voice information generated by the voice interaction process when the condition is satisfied from the server. The providing unit provides audio information to the user.
 また、本開示の一態様に係る音声対話システムは、移動体に搭乗しているユーザからの音声に基づき、ユーザと対話する音声対話システムであって、音声を取得し、ユーザに音声情報を提供する情報端末装置と、サーバとを含み、サーバは、情報送信部と、生成部と、音声情報送信部とを備え、情報端末装置は、トリガ情報受信部と、判定部と、送信部と、音声情報受信部と、提供部と備える。 In addition, a voice interaction system according to an aspect of the present disclosure is a voice interaction system that interacts with a user based on voice from a user who is on a moving body, obtains the voice, and provides voice information to the user An information terminal device and a server, and the server includes an information transmission unit, a generation unit, and a voice information transmission unit, and the information terminal device includes a trigger information reception unit, a determination unit, a transmission unit, A voice information receiving unit and a providing unit are provided.
 トリガ情報送信部は、情報端末装置に、移動体又はユーザの状態の条件を示すトリガ情報を送信し、当該条件を満たすときに通知信号を送信することを依頼する。生成部は、通知信号を受信した場合、条件が満たされた場合の音声対話用処理を行うことで音声情報を生成する。音声情報送信部は、音声情報を情報端末装置に送信する。トリガ情報受信部は、サーバからトリガ情報を受信する。判定部は、条件が満たされたか否かを判定する。送信部は、条件が満たされた場合に、サーバに通知信号を送信する。音声情報受信部は、サーバから音声情報を受信する。提供部は、音声情報をユーザに提供する。 The trigger information transmitting unit transmits to the information terminal device trigger information indicating the condition of the mobile object or the user, and requests to transmit a notification signal when the condition is satisfied. When the generation unit receives the notification signal, the generation unit generates voice information by performing a voice interaction process when the condition is satisfied. The voice information transmitting unit transmits voice information to the information terminal device. The trigger information receiving unit receives trigger information from the server. The determination unit determines whether or not the condition is satisfied. The transmission unit transmits a notification signal to the server when the condition is satisfied. The voice information receiving unit receives voice information from the server. The providing unit provides audio information to the user.
 なお、これらの包括的または具体的な態様は、システム、方法、集積回路、コンピュータプログラムまたはコンピュータ読み取り可能なCD-ROMなどの記録媒体で実現されてもよく、システム、方法、集積回路、コンピュータプログラム及び記録媒体の任意な組み合わせで実現されてもよい。 Note that these comprehensive or specific modes may be realized by a system, a method, an integrated circuit, a computer program, or a recording medium such as a computer-readable CD-ROM, and the system, method, integrated circuit, and computer program. Also, any combination of recording media may be realized.
 以下、本開示の実施の形態について、図面を参照しながら説明する。なお、以下で説明する実施の形態は、いずれも本開示の一具体例を示すものである。以下の実施の形態で示される数値、形状、材料、構成要素、構成要素の配置位置及び接続形態、ステップ、ステップの順序などは、一例であり、本開示を限定する主旨ではない。また、以下の実施の形態における構成要素のうち、最上位概念を示す独立請求項に記載されていない構成要素については、任意の構成要素として説明される。 Hereinafter, embodiments of the present disclosure will be described with reference to the drawings. Note that each of the embodiments described below shows a specific example of the present disclosure. Numerical values, shapes, materials, components, arrangement positions and connection forms of components, steps, order of steps, and the like shown in the following embodiments are merely examples, and are not intended to limit the present disclosure. In addition, among the constituent elements in the following embodiments, constituent elements that are not described in the independent claims indicating the highest concept are described as optional constituent elements.
 (実施の形態)
 本実施の形態に係る音声対話システムでは、基本的には、サーバにおいて音声対話のための対話処理を行う。また、当該対話処理において、車両情報の条件を判定する処理が発生した場合には、当該条件が、クライアント装置に通知される。クライアント装置は、車両情報が上記条件を満たすかを判定し、判定結果をサーバに通知する。このように、条件の判定のみがクライアント装置で行われることにより、クライアント装置からサーバに車両情報を逐次送信する必要がない。これにより、サーバとクライアント装置との間で伝送されるデータ量を削減できる。また、クライアント装置では、条件の判定のみを行なうので、クライアント装置が対話処理のための膨大なデータベースを保持する必要がない。また、クライアント装置における処理量の増加を抑制できる。
(Embodiment)
In the voice dialogue system according to the present embodiment, basically, dialogue processing for voice dialogue is performed in the server. Further, when a process for determining the condition of the vehicle information occurs in the dialogue process, the condition is notified to the client device. The client device determines whether the vehicle information satisfies the above condition and notifies the server of the determination result. Thus, since only the determination of the condition is performed by the client device, it is not necessary to sequentially transmit vehicle information from the client device to the server. Thereby, the amount of data transmitted between the server and the client device can be reduced. Further, since the client device only determines the condition, it is not necessary for the client device to hold a huge database for interactive processing. In addition, an increase in the processing amount in the client device can be suppressed.
 まず、本実施の形態に係る音声対話システム100の構成を説明する。図1は、本実施の形態における音声対話システム100の全体構成を示す図である。図1に示す音声対話システム100は、クライアントシステム110とサーバ120とを含む。クライアントシステム110とサーバ120とは、ネットワークを介して、通信可能である。クライアントシステム110は、例えば、ユーザが搭乗している車両(例えば自動車)内に設定されている。 First, the configuration of the voice interaction system 100 according to the present embodiment will be described. FIG. 1 is a diagram showing an overall configuration of a voice interaction system 100 according to the present embodiment. A voice interaction system 100 shown in FIG. 1 includes a client system 110 and a server 120. The client system 110 and the server 120 can communicate with each other via a network. The client system 110 is set, for example, in a vehicle (for example, an automobile) on which the user is boarded.
 クライアントシステム110は、サーバ120と通信を行うクライアント装置101と、ユーザの音声を取得するデバイスの一例であるマイク102と、合成音声を再生するデバイスの一例であるスピーカ103と、車両の状態を示す車両情報を取得する車両情報取得部104とを備える。 The client system 110 indicates a client apparatus 101 that communicates with the server 120, a microphone 102 that is an example of a device that acquires a user's voice, a speaker 103 that is an example of a device that reproduces synthesized voice, and the state of the vehicle. A vehicle information acquisition unit 104 that acquires vehicle information.
 また、サーバ120は、ユーザの発話音声を示す発話音声信号を発話文字列に変換する音声認識部107と、発話文字列に基づき音声対話処理を行うことで合成文字列を生成する対話処理部108と、合成文字列を合成音声信号に変換する音声合成部109と備える。なお、以下では、音声認識部107、対話処理部108及び音声合成部109は、それぞれ個別のサーバ装置である場合の例を説明する。なお、音声認識部107、対話処理部108及び音声合成部109の全て又はいずれか2つが、単一のサーバ装置として実現されてもよい。 The server 120 also includes a speech recognition unit 107 that converts an utterance voice signal indicating a user's utterance voice into an utterance character string, and an interaction processing unit 108 that generates a synthesized character string by performing an audio dialogue process based on the utterance character string. And a speech synthesizer 109 for converting the synthesized character string into a synthesized speech signal. In the following, an example in which the voice recognition unit 107, the dialogue processing unit 108, and the voice synthesis unit 109 are individual server devices will be described. Note that all or any two of the speech recognition unit 107, the dialogue processing unit 108, and the speech synthesis unit 109 may be realized as a single server device.
 図2は、クライアント装置101のブロック図である。図2に示すように、クライアント装置101は、マイク102からのマイク入力信号から発話音声信号を生成する音声取得部201と、合成音声信号に基づくスピーカ出力信号をスピーカ103へ出力する音声出力部202と、サーバ120とのデータの送受信を行う通信部203とを備える。さらに、サーバ120から受け取ったトリガ情報を解釈するトリガ情報解釈部204と、トリガ情報で示されるトリガ条件を保持するクライアントトリガ条件保持部205と、車両情報取得部104で取得された車両情報を保持する車両状態管理部206と、トリガ条件の判定を行う判定部207と、トリガ条件が満たされたことをサーバ120に通知するための通知信号を生成する通知部208とを備える。 FIG. 2 is a block diagram of the client device 101. As illustrated in FIG. 2, the client apparatus 101 includes a voice acquisition unit 201 that generates a speech voice signal from a microphone input signal from the microphone 102, and a voice output unit 202 that outputs a speaker output signal based on the synthesized voice signal to the speaker 103. And a communication unit 203 that transmits / receives data to / from the server 120. Furthermore, the trigger information interpretation unit 204 that interprets the trigger information received from the server 120, the client trigger condition holding unit 205 that holds the trigger condition indicated by the trigger information, and the vehicle information acquired by the vehicle information acquisition unit 104 are stored. A vehicle state management unit 206 that performs determination of a trigger condition, and a notification unit 208 that generates a notification signal for notifying the server 120 that the trigger condition is satisfied.
 図3は、対話処理部108のブロック図である。図3に示すように、対話処理部108は、発話文字列及び通知信号を受信する入力文字列管理部301と、状態遷移処理を行う状態管理部302と、キーワードを保持するキーワードDB(データベース)303と、発話文字列に対するマッチング処理を行うマッチング処理部304と、キーワード文字列を保持するキーワード保持部305と、対話処理を実行する対話処理実行部306とを備える。さらに合成文字列テンプレートを保持する合成文字列テンプレートDB(データベース)307と、トリガテンプレートを保持するトリガテンプレートDB(データベース)308と、合成文字列及びトリガ情報を出力する出力文字列管理部309と、トリガ条件を保持するサーバートリガ条件保持部310とを備える。 FIG. 3 is a block diagram of the dialogue processing unit 108. As shown in FIG. 3, the dialogue processing unit 108 includes an input character string management unit 301 that receives an utterance character string and a notification signal, a state management unit 302 that performs state transition processing, and a keyword DB (database) that holds keywords. 303, a matching processing unit 304 that performs a matching process on an utterance character string, a keyword holding unit 305 that holds a keyword character string, and a dialogue processing execution unit 306 that executes dialogue processing. Furthermore, a synthetic character string template DB (database) 307 that holds a synthetic character string template, a trigger template DB (database) 308 that holds a trigger template, an output character string management unit 309 that outputs a synthetic character string and trigger information, A server trigger condition holding unit 310 that holds the trigger condition.
 以下、以上のように構成された音声対話システム100の動作を説明する。図4は、本実施の形態における音声対話システム100の動作を示す図である。 Hereinafter, the operation of the spoken dialogue system 100 configured as described above will be described. FIG. 4 is a diagram illustrating the operation of the voice interaction system 100 according to the present embodiment.
 まず、ユーザが発話すると、クライアント装置101のマイク102はマイク入力信号を生成する。音声取得部201は、マイク入力信号を取得し、当該マイク入力信号を符号化することでデジタル信号である発話音声信号を生成する。通信部203は、生成された発話音声信号を音声認識部107に送信する(S101)。 First, when the user speaks, the microphone 102 of the client apparatus 101 generates a microphone input signal. The voice acquisition unit 201 acquires a microphone input signal and encodes the microphone input signal to generate a speech voice signal that is a digital signal. The communication unit 203 transmits the generated utterance voice signal to the voice recognition unit 107 (S101).
 音声認識部107は、受信した発話音声信号を音声認識により発話文字列に変換し、発話文字列を、クライアント装置101を経由して対話処理部108に送信する(S102)。 The voice recognition unit 107 converts the received utterance voice signal into an utterance character string by voice recognition, and transmits the utterance character string to the dialogue processing unit 108 via the client device 101 (S102).
 対話処理部108の入力文字列管理部301は、発話文字列を受け取り、受け取った発話文字列を保持する。マッチング処理部304は、状態管理部302で管理されている現在の状態においてマッチング処理を行う対象のキーワードをキーワードDB303から取得する。マッチング処理部304は、取得した発話文字列とキーワードとのマッチング処理を行い、合致したキーワードをキーワード保持部305に保持する(S103)。 The input character string management unit 301 of the dialogue processing unit 108 receives the utterance character string and holds the received utterance character string. The matching processing unit 304 acquires, from the keyword DB 303, a keyword to be subjected to matching processing in the current state managed by the state management unit 302. The matching processing unit 304 performs matching processing between the acquired utterance character string and the keyword, and holds the matched keyword in the keyword holding unit 305 (S103).
 そして、状態管理部302は、合致したキーワードに基づき状態遷移を行う。対話処理実行部306は、当該状態遷移に伴う対話処理を行うことで合成文字列及びトリガ情報を生成する。 Then, the state management unit 302 performs state transition based on the matched keyword. The dialogue processing execution unit 306 generates a synthesized character string and trigger information by performing dialogue processing associated with the state transition.
 具体的には、対話処理実行部306は、(1)データベースの取得、変更又は設定、(2)条件の判定、(3)情報の検索、取得又は加工、及び(4)合成文字列の生成などを行う。より具体的には、対話処理実行部306は、合成文字列テンプレートDB307から合成文字列テンプレートを取得し、取得した合成文字列テンプレートに、キーワード保持部305から読み出した文字列を設定し、後述する<VOICE>タグによりタグ付けを行うことで合成文字列を生成する。 Specifically, the dialogue processing execution unit 306 (1) database acquisition, change or setting, (2) condition determination, (3) information search, acquisition or processing, and (4) composite character string generation And so on. More specifically, the dialogue processing execution unit 306 acquires the composite character string template from the composite character string template DB 307, sets the character string read from the keyword holding unit 305 to the acquired composite character string template, and will be described later. A synthetic character string is generated by tagging with the <VOICE> tag.
 さらに、対話処理実行部306は、トリガテンプレートDB308からトリガテンプレートを取得し、取得されたトリガテンプレートに、キーワード保持部305から読み出した文字列を設定することでトリガ情報を生成する。また、このトリガ情報で示されるトリガ条件はサーバートリガ条件保持部310に保持される。そして、対話処理実行部306は、生成されたトリガ情報に後述する<RULE>タグによりタグ付けを行ったうえで合成文字列に付与する(S104)。 Furthermore, the dialogue processing execution unit 306 acquires a trigger template from the trigger template DB 308, and generates trigger information by setting a character string read from the keyword holding unit 305 to the acquired trigger template. The trigger condition indicated by the trigger information is held in the server trigger condition holding unit 310. Then, the dialog processing execution unit 306 tags the generated trigger information with a <RULE> tag, which will be described later, and assigns it to the synthesized character string (S104).
 出力文字列管理部309は、生成された合成文字列をクライアント装置101経由で音声合成部109に送信する。このとき、クライアント装置101の通信部203は、合成文字列に付与されたトリガ情報を取得する。 The output character string management unit 309 transmits the generated synthesized character string to the voice synthesis unit 109 via the client device 101. At this time, the communication unit 203 of the client device 101 acquires trigger information given to the composite character string.
 音声合成部109は、受信した合成文字列から合成音声信号を生成し、生成された合成音声信号をクライアント装置101に送信する(S105)。クライアント装置101の通信部203は合成音声信号を受信し、音声出力部202は合成音声信号を復号化することでスピーカ出力信号を生成する。スピーカ103は、生成されたスピーカ出力信号に基づき合成音声を再生する(S106)。 The speech synthesizer 109 generates a synthesized speech signal from the received synthesized character string and transmits the generated synthesized speech signal to the client device 101 (S105). The communication unit 203 of the client device 101 receives the synthesized voice signal, and the voice output unit 202 generates a speaker output signal by decoding the synthesized voice signal. The speaker 103 reproduces the synthesized voice based on the generated speaker output signal (S106).
 一方、トリガ情報解釈部204は、通信部203で取得されたトリガ情報を解釈し、トリガ情報で示されるトリガ条件をクライアントトリガ条件保持部205に保持する。例えば、クライアントトリガ条件保持部205に保持されるトリガ条件は、サーバ120に含まれるサーバートリガ条件保持部310に保持されるトリガ条件と同じである。 On the other hand, the trigger information interpretation unit 204 interprets the trigger information acquired by the communication unit 203 and holds the trigger condition indicated by the trigger information in the client trigger condition holding unit 205. For example, the trigger condition held in the client trigger condition holding unit 205 is the same as the trigger condition held in the server trigger condition holding unit 310 included in the server 120.
 この一連の流れが繰り返されることで音声対話が成立する。 音 声 Voice conversation is established by repeating this series of flows.
 さらに、クライアント装置101の車両状態管理部206は、車両情報取得部104で取得された車両情報を保持する。判定部207は、車両情報で示される車両状態が、クライアントトリガ条件保持部205に保持されたトリガ条件を満たすかを定期的に判定する(S107)。車両状態がトリガ条件に合致した場合(トリガ発火)、通知部208は通知信号を生成する。また、通信部203は、生成された通知信号を対話処理部108に送信する。また、このとき、対話処理部108はクライアントトリガ条件保持部205に保持される、車両状態に合致したトリガ条件を削除する。 Furthermore, the vehicle state management unit 206 of the client device 101 holds the vehicle information acquired by the vehicle information acquisition unit 104. The determination unit 207 periodically determines whether the vehicle state indicated by the vehicle information satisfies the trigger condition held in the client trigger condition holding unit 205 (S107). When the vehicle state matches the trigger condition (trigger firing), the notification unit 208 generates a notification signal. Further, the communication unit 203 transmits the generated notification signal to the dialogue processing unit 108. At this time, the dialogue processing unit 108 deletes the trigger condition that is held in the client trigger condition holding unit 205 and that matches the vehicle state.
 対話処理部108の入力文字列管理部301は、通知信号を受信する。マッチング処理部304は、通知信号が状態遷移を行うためのメッセージであることを解釈する。状態管理部302は通知信号に基づき状態遷移を行う。このとき状態管理部302は、サーバートリガ条件保持部310に保持され、通知信号に対応しているトリガ条件を削除する。また、対話処理実行部306は、遷移後の状態に基づき、合成文字列を生成する(S108)。また、出力文字列管理部309は、生成された合成文字列を、クライアント装置101を介して音声合成部109へ送信する。 The input character string management unit 301 of the dialogue processing unit 108 receives the notification signal. The matching processing unit 304 interprets that the notification signal is a message for performing state transition. The state management unit 302 performs state transition based on the notification signal. At this time, the state management unit 302 deletes the trigger condition held in the server trigger condition holding unit 310 and corresponding to the notification signal. Further, the dialogue processing execution unit 306 generates a composite character string based on the state after the transition (S108). Further, the output character string management unit 309 transmits the generated synthesized character string to the voice synthesis unit 109 via the client device 101.
 音声合成部109は、合成文字列から合成音声信号を生成し、生成された合成音声信号をクライアント装置101に送信する(S109)。クライアント装置101の通信部203は合成音声信号を受信し、音声出力部202は合成音声信号を復号化することでスピーカ出力信号を生成する。スピーカ103は、生成されたスピーカ出力信号を出力することで合成音声を再生する(S110)。このようにして、プッシュ型の対話が実現される。 The speech synthesizer 109 generates a synthesized speech signal from the synthesized character string and transmits the generated synthesized speech signal to the client device 101 (S109). The communication unit 203 of the client device 101 receives the synthesized voice signal, and the voice output unit 202 generates a speaker output signal by decoding the synthesized voice signal. The speaker 103 reproduces the synthesized speech by outputting the generated speaker output signal (S110). In this way, a push-type dialogue is realized.
 このとき、プッシュ型の対話におけるシステム側からの発話は、任意のタイミングで行われる。従って、既に対話の最中であることも考えられる。そこで、対話に優先度というパラメータが設定される。そして、現在行われている対話とプッシュ対話との優先度が比較され、優先度の高い方の対話が完了後に優先度の低い対話が行われる。 At this time, the utterance from the system side in the push-type dialogue is performed at an arbitrary timing. Therefore, it is possible that the conversation is already in progress. Therefore, a parameter called priority is set for the conversation. Then, the priorities of the current dialogue and the push dialogue are compared, and the dialogue with the lower priority is performed after the dialogue with the higher priority is completed.
 すなわち、通知信号に優先度が記載されることで、状態管理部302は、状態遷移を行うべきかを判断できる。プッシュ対話の優先度が低い場合は、この状態遷移処理は対話スタックに格納され、優先度の高い対話が終了後にロードされる。例えば、優先度は予め対話シナリオ内で記述される。優先度は、対話の内容に従い記述されており、クライアント側から通知される通知信号の優先度と現在行われている対話の優先度を比較し、状態遷移を行うかどうかを制御する。 That is, when the priority is described in the notification signal, the state management unit 302 can determine whether the state transition should be performed. When the priority of the push dialog is low, this state transition process is stored in the dialog stack, and the high-priority dialog is loaded after the end. For example, the priority is described in advance in the dialogue scenario. The priority is described according to the content of the dialogue, and the priority of the notification signal notified from the client side is compared with the priority of the dialogue currently being performed to control whether the state transition is performed.
 次に、クライアントシステム110及びサーバ120の各々における処理の流れを説明する。 Next, the flow of processing in each of the client system 110 and the server 120 will be described.
 図5は、クライアントシステム110による処理の流れを示すフローチャートである。なお、図5の処理は、所定の周期毎に繰り返し行われる。 FIG. 5 is a flowchart showing the flow of processing by the client system 110. Note that the process of FIG. 5 is repeatedly performed at predetermined intervals.
 まず、クライアント装置101は、音声取得部201が発話音声(マイク入力信号)を取得したかを判定する(S201)。音声取得部201がマイク入力信号を取得した場合(S201でYes)、音声取得部201は、マイク入力信号から発話音声信号を生成し、通信部203は、発話音声信号をサーバ120へ送信する(S202)。 First, the client apparatus 101 determines whether or not the voice acquisition unit 201 has acquired a speech voice (microphone input signal) (S201). When the voice acquisition unit 201 acquires a microphone input signal (Yes in S201), the voice acquisition unit 201 generates an utterance voice signal from the microphone input signal, and the communication unit 203 transmits the utterance voice signal to the server 120 ( S202).
 また、クライアント装置101は、通信部203が音声合成信号を受信したかを判定する(S203)。通信部203が音声合成信号を受信した場合(S203でYes)、音声出力部202は、合成音声信号を復号化することでスピーカ出力信号を生成する。スピーカ103は、生成されたスピーカ出力信号を出力する(S204)。 Further, the client apparatus 101 determines whether the communication unit 203 has received a voice synthesis signal (S203). When the communication unit 203 receives the voice synthesis signal (Yes in S203), the voice output unit 202 generates a speaker output signal by decoding the synthesized voice signal. The speaker 103 outputs the generated speaker output signal (S204).
 また、クライアント装置101は、通信部203がトリガ情報を受信したかを判定する(S205)。通信部203がトリガ情報を受信した場合(S205でYes)、トリガ情報解釈部204は、通信部203で取得されたトリガ情報を解釈し、トリガ情報で示されるトリガ条件をクライアントトリガ条件保持部205に保持する。そして、判定部207は、車両状態管理部206で管理される車両状態が、クライアントトリガ条件保持部205に保持されたトリガ条件を満たすかのトリガ判定を開始する(S206)。 Further, the client apparatus 101 determines whether the communication unit 203 has received trigger information (S205). When the communication unit 203 receives the trigger information (Yes in S205), the trigger information interpretation unit 204 interprets the trigger information acquired by the communication unit 203 and sets the trigger condition indicated by the trigger information to the client trigger condition holding unit 205. Hold on. Then, the determination unit 207 starts trigger determination as to whether the vehicle state managed by the vehicle state management unit 206 satisfies the trigger condition held in the client trigger condition holding unit 205 (S206).
 また、トリガ判定が行われている場合(S207でYes)、判定部207は、トリガ判定を行う(S208)。車両状態がトリガ条件に合致した場合(トリガ発火)(S208でYes)、通知部208は通知信号を生成する。また、通信部203は、生成された通知信号を対話処理部108に送信する(S209)。 Further, when the trigger determination is performed (Yes in S207), the determination unit 207 performs the trigger determination (S208). When the vehicle state matches the trigger condition (trigger firing) (Yes in S208), the notification unit 208 generates a notification signal. In addition, the communication unit 203 transmits the generated notification signal to the dialogue processing unit 108 (S209).
 なお、ステップS201及びS202の処理と、ステップS203及びS204の処理と、ステップS205及びS206の処理と、ステップS207~S209の処理との順序は一例であり、図5に示す順序以外であってもよいし、一部の処理が同時(並列)あるいは、処理時間が重複して行なわれてもよい。 Note that the order of the processes in steps S201 and S202, the processes in steps S203 and S204, the processes in steps S205 and S206, and the processes in steps S207 to S209 is an example, and may be other than the order shown in FIG. Alternatively, some processes may be performed simultaneously (in parallel) or with overlapping processing times.
 次に、サーバ120による処理の流れを説明する。図6は、サーバ120による処理の流れを示すフローチャートである。なお、図6の処理は、所定の周期毎に繰り返し行われる。 Next, the flow of processing by the server 120 will be described. FIG. 6 is a flowchart showing the flow of processing by the server 120. Note that the process of FIG. 6 is repeatedly performed at predetermined intervals.
 サーバ120は、発話音声信号を受信したかを判定する(S301)。発話音声信号が受信された場合(S301でYes)、音声認識部107は、音声認識により発話音声信号を発話文字列に変換する(S302)。 The server 120 determines whether an utterance voice signal has been received (S301). When an utterance voice signal is received (Yes in S301), the voice recognition unit 107 converts the utterance voice signal into an utterance character string by voice recognition (S302).
 次に、対話処理部108の入力文字列管理部301は、発話文字列を受け取り、受け取った発話文字列を保持する。マッチング処理部304は、状態管理部302で管理されている現在の状態において、マッチング処理を行う対象のキーワードをキーワードDB303から取得する。マッチング処理部304は、取得した発話文字列とキーワードとのマッチング処理を行い、合致したキーワードをキーワード保持部305に保持する(S303)。 Next, the input character string management unit 301 of the dialogue processing unit 108 receives the utterance character string and holds the received utterance character string. The matching processing unit 304 acquires, from the keyword DB 303, a keyword to be subjected to matching processing in the current state managed by the state management unit 302. The matching processing unit 304 performs matching processing between the acquired utterance character string and the keyword, and holds the matched keyword in the keyword holding unit 305 (S303).
 そして、状態管理部302は、合致したキーワードに基づき状態遷移を行う。対話処理実行部306は、当該状態遷移に伴う対話処理を行うことで合成文字列を生成する(S304)。 Then, the state management unit 302 performs state transition based on the matched keyword. The dialogue processing execution unit 306 generates a synthesized character string by performing dialogue processing associated with the state transition (S304).
 音声合成部109は、合成文字列から合成音声信号を生成し、生成された合成音声信号をクライアント装置101に送信する(S305)。 The speech synthesizer 109 generates a synthesized speech signal from the synthesized character string and transmits the generated synthesized speech signal to the client device 101 (S305).
 また、ステップS304での状態遷移に伴いトリガ条件を伴う判定が発生した場合(S306でYes)、対話処理実行部306は、トリガ条件を示すトリガ情報を生成し、出力文字列管理部309は、生成されたトリガ情報をクライアント装置101へ送信する(S307)。 In addition, when the determination with the trigger condition occurs in accordance with the state transition in step S304 (Yes in S306), the dialogue processing execution unit 306 generates trigger information indicating the trigger condition, and the output character string management unit 309 The generated trigger information is transmitted to the client apparatus 101 (S307).
 また、サーバ120は、通知信号を受信したかを判定する(S308)。通知信号が受信された場合(S308でYes)、マッチング処理部304は、通知信号が状態遷移を行うためのメッセージであることを解釈する。状態管理部302は通知信号に基づき状態遷移を行う。また、対話処理実行部306は、遷移後の状態に基づき、対話処理を行うことで合成文字列を生成する(S309)。 Also, the server 120 determines whether a notification signal has been received (S308). When the notification signal is received (Yes in S308), the matching processing unit 304 interprets that the notification signal is a message for performing state transition. The state management unit 302 performs state transition based on the notification signal. In addition, the dialogue processing execution unit 306 generates a synthesized character string by performing dialogue processing based on the state after the transition (S309).
 音声合成部109は、合成文字列から合成音声信号を生成し、生成された合成音声信号をクライアント装置101に送信する(S310)。 The speech synthesizer 109 generates a synthesized speech signal from the synthesized character string and transmits the generated synthesized speech signal to the client device 101 (S310).
 なお、ステップS301~S307の処理と、ステップS308~S310の処理との順序は一例であり、図6に示す順序以外であってもよいし、一部の処理が同時(並列)あるいは、処理時間が重複して行なわれてもよい。 Note that the order of the processes of steps S301 to S307 and the processes of steps S308 to S310 is merely an example, and may be other than the order shown in FIG. 6, or some processes may be performed simultaneously (parallel) or processing time. May be performed in duplicate.
 また、ここでは、サーバ120に含まれる音声認識部107、対話処理部108及び音声合成部109が、それぞれ個別のサーバ装置である場合の例を説明したが、音声認識部107、対話処理部108及び音声合成部109の全て又はいずれか2つが、単一のサーバ装置として実現されてもよい。この場合、発話文字列は、クライアント装置101を経由せずに、音声認識部107から対話処理部108に送られる。同様に、合成文字列は、クライアント装置101を経由せずに、対話処理部108から音声合成部109に送られる。 In addition, here, an example in which the speech recognition unit 107, the dialogue processing unit 108, and the speech synthesis unit 109 included in the server 120 are individual server devices has been described, but the speech recognition unit 107, the dialogue processing unit 108, and the like. All or any two of the speech synthesizers 109 may be realized as a single server device. In this case, the utterance character string is sent from the voice recognition unit 107 to the dialogue processing unit 108 without going through the client device 101. Similarly, the synthesized character string is sent from the dialogue processing unit 108 to the voice synthesis unit 109 without passing through the client device 101.
 また、音声認識部107及び音声合成部109の少なくとも一方が、クライアントシステム110に含まれてもよい。 Further, at least one of the speech recognition unit 107 and the speech synthesis unit 109 may be included in the client system 110.
 ここまでの流れの具体例を、図7A~図12を用いて説明する。図7A~図7Cは、具体的な動作例を説明するための図である。図8は、キーワードDB303に保持されるキーワードの一例を示す図である。図9は、対話処理における状態遷移の一例を示す図である。図10は、合成文字列テンプレートDB307に保持される合成文字列テンプレートの一例を示す図である。図11は、トリガテンプレートDB308に保持されるトリガテンプレートの一例を示す図である。図12は、クライアントトリガ条件保持部205及びサーバートリガ条件保持部310に保持されるトリガ条件の一例を示す図である。 A specific example of the flow so far will be described with reference to FIGS. 7A to 12. FIG. 7A to 7C are diagrams for explaining a specific operation example. FIG. 8 is a diagram illustrating an example of keywords held in the keyword DB 303. FIG. 9 is a diagram illustrating an example of state transition in the dialogue processing. FIG. 10 is a diagram illustrating an example of a composite character string template stored in the composite character string template DB 307. FIG. 11 is a diagram illustrating an example of a trigger template stored in the trigger template DB 308. FIG. 12 is a diagram illustrating an example of trigger conditions held in the client trigger condition holding unit 205 and the server trigger condition holding unit 310.
 図7Aに示すようにユーザが「AAAタワーに行きたい」と発話した音声を音声認識部107で認識する(S401)。対話処理部108のマッチング処理部304は、音声認識結果の発話文字列に対してキーワードマッチングを行う(S402)。ここで、図8に示すように、キーワードDB303には、キーワード410と、参照キーワードグループ420とが格納されている。キーワード410は、複数のテーブル401~405を含む。テーブル401~405の各々は、1以上のキーワード(例えば、テーブル401における「{場所}に行きたい」)を含むキーワードリストである。また、テーブル401~405の各々には、遷移元の状態と、遷移先の状態とが設定されている。 As shown in FIG. 7A, the voice recognizing unit 107 recognizes the voice spoken by the user as “I want to go to the AAA tower” (S401). The matching processing unit 304 of the dialogue processing unit 108 performs keyword matching on the uttered character string of the speech recognition result (S402). Here, as shown in FIG. 8, the keyword DB 303 stores a keyword 410 and a reference keyword group 420. The keyword 410 includes a plurality of tables 401-405. Each of the tables 401 to 405 is a keyword list including one or more keywords (for example, “I want to go to {location}” in the table 401). In each of the tables 401 to 405, a transition source state and a transition destination state are set.
 また、各キーワードにはキーワードグループ(図8では{}で示されている)が含まれる場合がある。参照キーワードグループ420は、各々がこのキーワードグループを示すリスト406~409を含む。リスト406~409の各々は、キーワードグループに含まれる複数の単語のリストである。 Each keyword may include a keyword group (indicated by {} in FIG. 8). The reference keyword group 420 includes lists 406 to 409 each indicating this keyword group. Each of the lists 406 to 409 is a list of a plurality of words included in the keyword group.
 また、初期状態として遷移状態は、図9に示す「通常状態0100」に設定されている。 Also, as an initial state, the transition state is set to “normal state 0100” shown in FIG.
 まず、マッチング処理部304は、図8に示すキーワード410の中から、遷移元として現在の状態0100が設定されているテーブル401、404及び405を取得し、テーブル401、404及び405に含まれるキーワードから、発話文字列に合致するキーワードを検索する。ここでは、テーブル401に含まれる「{場所}に行きたい」が検索される。 First, the matching processing unit 304 acquires the tables 401, 404, and 405 in which the current state 0100 is set as the transition source from the keywords 410 illustrated in FIG. 8, and the keywords included in the tables 401, 404, and 405 are obtained. From the above, a keyword matching the utterance character string is searched. Here, “I want to go to {location}” included in the table 401 is searched.
 また、マッチング処理部304は、キーワードにキーワードグループが含まれている場合、参照キーワードグループ420からそのキーワードグループ(この例では{場所}のリスト406)の単語全てをキーワード保持部305に保持する。 Also, when the keyword group is included in the keyword, the matching processing unit 304 holds all the words of the keyword group (in this example, the list 406 of {location}) from the reference keyword group 420 in the keyword holding unit 305.
 また、状態管理部302は、テーブル401に設定されている遷移先の状態0101に状態遷移を行う。このように、状態管理部302は、「AAAタワーに行きたい」という発話文字列とマッチングした結果に基づき、遷移先を状態0101に決定できる。つまり、状態管理部302は、図9に示す通常状態0100から目的地設定対話を行なうための目的地設定状態0101に状態遷移を行う。その際、キーワード保持部305は、「AAAタワー」を目的地として保持する。また、対話処理部108は、AAAタワーの位置取得などの処理を行う。 In addition, the state management unit 302 performs state transition to the transition destination state 0101 set in the table 401. As described above, the state management unit 302 can determine the transition destination as the state 0101 based on the result of matching with the utterance character string “I want to go to the AAA tower”. That is, the state management unit 302 performs a state transition from the normal state 0100 illustrated in FIG. 9 to the destination setting state 0101 for performing the destination setting dialogue. At that time, the keyword holding unit 305 holds “AAA tower” as a destination. Further, the dialogue processing unit 108 performs processing such as acquisition of the position of the AAA tower.
 次に、対話処理実行部306は、遷移状態に基づき対話処理を行うことで合成文字列を生成する。 Next, the dialog processing execution unit 306 generates a composite character string by performing dialog processing based on the transition state.
 図10に示すように、合成文字列テンプレートDB307に格納されている合成文字列テンプレートは、遷移状態を示す状態501と、合成文字列を生成する条件502と、合成文字列のテンプレート503とを含む。 As illustrated in FIG. 10, the composite character string template stored in the composite character string template DB 307 includes a state 501 indicating a transition state, a condition 502 for generating a composite character string, and a composite character string template 503. .
 対話処理実行部306は、図10に示す合成文字列テンプレートから状態501が0101であり、条件502を満たすテンプレート503「<VOICE>[目的地]を目的地に設定しますか?</VOICE>」を取得する。次に、対話処理実行部306は、先ほど目的地に設定したキーワード「AAAタワー」をキーワード保持部305から取得し、取得した文字列を[目的地]にあてはめることで、「<VOICE>AAAタワーを目的地に設定しますか?</VOICE>」という合成文字列を生成する(S403)。 The dialog processing execution unit 306 sets the template 503 “<VOICE> [Destination] as the destination?” In the state 501 of the composite character string template shown in FIG. Is obtained. Next, the dialogue processing execution unit 306 acquires the keyword “AAA tower” set as the destination earlier from the keyword holding unit 305, and applies the acquired character string to [Destination], so that “<VOICE> AAA tower”. Is set as the destination? </ VOICE> "is generated (S403).
 また、対話処理実行部306は、トリガテンプレートDB308からトリガテンプレートを取得する。図11に示すように、トリガテンプレートは、遷移状態を示す状態511と、トリガ情報を生成する条件512と、トリガ条件のテンプレート513とを含む。 Also, the dialogue processing execution unit 306 acquires a trigger template from the trigger template DB 308. As illustrated in FIG. 11, the trigger template includes a state 511 indicating a transition state, a condition 512 for generating trigger information, and a trigger condition template 513.
 対話処理実行部306は、図11に示すトリガテンプレートから状態511が0101であり、条件512を満たすテンプレート513を取得する。今回の例では合致するテンプレート513が存在しないので、合成文字列がそのままクライアント装置101に送信される。 The dialogue processing execution unit 306 acquires a template 513 that has a state 511 of 0101 and satisfies the condition 512 from the trigger template shown in FIG. In this example, since there is no matching template 513, the composite character string is transmitted to the client apparatus 101 as it is.
 クライアント装置101は、<VOICE>タグの中身を音声合成部109に送信し、音声合成結果である合成音声信号を受け取る。そして、クライアント装置101は、合成音声信号で示される「AAAタワーを目的に設定しますか?」を出音する(S404)。また、クライアント装置101は、トリガ情報を受け取っていないため、トリガ条件は更新しない。 The client apparatus 101 transmits the contents of the <VOICE> tag to the voice synthesizer 109 and receives a synthesized voice signal that is a voice synthesis result. Then, the client apparatus 101 outputs “Do you want to set the AAA tower for the purpose?” Indicated by the synthesized voice signal (S404). Further, since the client apparatus 101 has not received the trigger information, the trigger condition is not updated.
 次に、ユーザが「はい」と発話した音声を音声認識部107で認識した場合(S405)、対話処理部108のマッチング処理部304は、音声認識部107から取得した発話文字列「はい」に対してキーワードマッチングを行う。具体的には、テーブル402が選択され、状態管理部302は、図9に示す目的地設定状態0101から目的地決定状態0102へ状態を遷移させる(S406)。 Next, when the voice recognition unit 107 recognizes the voice uttered by the user as “Yes” (S405), the matching processing unit 304 of the dialogue processing unit 108 changes the utterance character string “Yes” acquired from the voice recognition unit 107. Keyword matching is performed for this. Specifically, the table 402 is selected, and the state management unit 302 changes the state from the destination setting state 0101 shown in FIG. 9 to the destination determination state 0102 (S406).
 また、対話処理実行部306は、図10に示すテンプレート「<VOICE>[目的地]を目的地に設定しました</VOICE>」を取得し、[目的地]に「AAAタワー」を設定することで、合成文字列「<VOICE>AAAタワーを目的地に設定しました</VOICE>」を生成する(S407)。 Further, the dialogue processing execution unit 306 acquires the template “<VOICE> [Destination] set as destination </ VOICE>” shown in FIG. 10 and sets “AAA tower” as [Destination]. Thus, the composite character string “<VOICE> AAA tower is set as the destination </ VOICE>” is generated (S407).
 さらに、対話処理実行部306は、トリガテンプレートDB308からトリガテンプレートを取得し、トリガ情報を生成する(S408)。 Furthermore, the dialogue processing execution unit 306 acquires a trigger template from the trigger template DB 308 and generates trigger information (S408).
 具体的には、対話処理実行部306は、図11に示すトリガテンプレートから状態511が0102であり、条件512を満たすテンプレート513「<RULE>[ID],GPS,[緯度],[経度],5,0301,M</RULE>」を取得する。なお、ここでは、目的地が現在地から5km圏内でなく、かつ駐車場のない施設であるとする。また、上記テンプレートは、目的地の5km圏内に車両が入ったときに駐車場を案内する対話[図9に示す目的地接近状態0301]を優先度「中」でプッシュ発話するという内容である。 Specifically, the dialogue processing execution unit 306 has a template 513 “<RULE> [ID], GPS, [latitude], [longitude], which satisfies the condition 512 from the trigger template shown in FIG. 5,0301, M </ RULE> ". Here, it is assumed that the destination is a facility that is not within 5 km from the current location and has no parking lot. Further, the template has a content that push dialogue is performed with a priority “medium” for a dialogue [destination approaching state 0301 shown in FIG. 9] for guiding a parking lot when a vehicle enters within a 5 km range of the destination.
 対話処理実行部306は、[ID]に固有の値(ここでは「AAA」)を、[緯度]には目的地の緯度を、[経度]には目的地の経度を設定することで、トリガ情報を生成する。このトリガ情報の内容を元にサーバートリガ条件保持部310は、図12に示すトリガ条件を保持する。図12に示すように、トリガ条件は、トリガ条件を識別するための固有の識別子であるID521と、判定対象の条件を示す条件522と、判定内容を示す内容523と、トリガ条件が満たされた場合の遷移先の状態を示す遷移先524と、トリガ条件が満たされた後の処理の優先度を示す優先度525とを含む。 The dialogue processing execution unit 306 triggers by setting a value (here, “AAA”) unique to [ID], the latitude of the destination in [Latitude], and the longitude of the destination in [Longitude]. Generate information. Based on the contents of the trigger information, the server trigger condition holding unit 310 holds the trigger conditions shown in FIG. As shown in FIG. 12, the trigger condition is an ID 521 that is a unique identifier for identifying the trigger condition, a condition 522 that indicates the condition to be determined, a content 523 that indicates the determination content, and the trigger condition is satisfied. Transition destination 524 indicating the state of the transition destination in the case, and priority 525 indicating the priority of processing after the trigger condition is satisfied.
 また、対話処理部108は、トリガ情報にタグ<RULELIST>を追加し、合成文字列に付与することで、文字列「<VOICE>AAAタワーを目的地に設定しました</VOICE><RULELIST><RULE>AAA,GPS,[緯度],[経度],5,0301,M</RULE></RULELIST>」を生成し、生成された文字列をクライアント装置101に送信する。 In addition, the dialog processing unit 108 adds the tag <RUELIST> to the trigger information and assigns it to the composite character string, thereby setting the character string “<VOICE> AAA tower as the destination </ VOICE> <RULELIST>. <RULE> AAA, GPS, [latitude], [longitude], 5,0301, M </ RULE> </ RULELIST> ", and the generated character string is transmitted to the client apparatus 101.
 クライアント装置101は<VOICE>タグの中身を音声合成部109に送信し、音声合成結果である合成音声信号を受け取る。そして、クライアント装置101は、合成音声信号で示される「AAAタワーを目的に設定しました」を出音する(S409)。 The client device 101 transmits the contents of the <VOICE> tag to the speech synthesizer 109 and receives a synthesized speech signal that is a speech synthesis result. The client apparatus 101 then outputs “AAA tower set for the purpose” indicated by the synthesized voice signal (S409).
 また、クライアント装置101は、<RULE>タグの中身をトリガ情報として取得し、図12に示すように、トリガ情報で示されるトリガ条件をクライアントトリガ条件保持部205に保持する。そして、クライアント装置101は、トリガ条件に従い判定を行う。すなわち、判定部207は、車両情報としてGPS情報を取得し、GPS情報で示される車両の位置と目的地の緯度及び経度との距離を求め、求めた距離が5km以上かどうかを判定する処理を、例えば、10[s]程度の周期で行う。 Further, the client apparatus 101 acquires the contents of the <RULE> tag as trigger information, and holds the trigger condition indicated by the trigger information in the client trigger condition holding unit 205 as shown in FIG. Then, the client apparatus 101 performs determination according to the trigger condition. That is, the determination unit 207 obtains GPS information as vehicle information, obtains the distance between the position of the vehicle indicated by the GPS information and the latitude and longitude of the destination, and determines whether the obtained distance is 5 km or more. For example, it is performed at a cycle of about 10 [s].
 そして、目的地の距離が5km以下になった場合、通知部208は、トリガ条件のID「AAA」と、遷移先を示す「0301」と、優先度中を示す「M」とを含む通知信号「<STATE>AAA,0301,M</STATE>」を生成する。このように、通知信号は、トリガ条件を一意に識別するための情報と、トリガ条件が満たされた場合の遷移状態の遷移先を示す情報と、トリガ条件が満たされた場合に行われる対話処理の優先度を示す情報とを含む。そして、この通知信号が対話処理部108に送信される。また、クライアントトリガ条件保持部205に格納されているID「AAA」のトリガ条件が削除される(S410)。 When the destination distance is 5 km or less, the notification unit 208 includes a notification signal including the trigger condition ID “AAA”, the transition destination “0301”, and the priority “M”. “<STATE> AAA, 0301, M </ STATE>” is generated. As described above, the notification signal includes information for uniquely identifying the trigger condition, information indicating the transition destination of the transition state when the trigger condition is satisfied, and interaction processing performed when the trigger condition is satisfied. And information indicating the priority of the. Then, this notification signal is transmitted to the dialogue processing unit 108. Also, the trigger condition of ID “AAA” stored in the client trigger condition holding unit 205 is deleted (S410).
 対話処理部108は、受信した通知信号に含まれる優先度に基づき、現在行われている対話処理と通知信号に基づく対話処理との優先度を判断する。図13を用いて、この処理の具体例を説明する。なお、図13では、通知信号に基づき、優先度が通常(中)の駐車場検索処理が行なわれる場合の例を示している。 The dialogue processing unit 108 determines the priority between the currently performed dialogue processing and the dialogue processing based on the notification signal based on the priority included in the received notification signal. A specific example of this process will be described with reference to FIG. Note that FIG. 13 shows an example in which a parking lot search process with normal (medium) priority is performed based on the notification signal.
 (1)図13の(a)に示すように対話中でない場合、または、(2)図13の(b)に示すように、優先度が低いスケジュール確認の対話中などの、通知信号に基づく駐車場検索処理の優先度の方が、現在行なわれている対話処理の優先度より高い場合は、状態管理部302は、通知信号に基づき、図9に示す通常状態0100から目的地接近状態0301に状態を遷移させる。また、対話処理実行部306は、ID521が「AAA」のトリガ条件をサーバートリガ条件保持部310から削除し、対話処理により「目的地近辺に到着しました。駐車場を案内いたしましょうか?」を示す合成文字列を生成し、生成された合成文字列をクライアント装置101に送信する(S411)。 (1) Based on a notification signal such as when the conversation is not in progress as shown in FIG. 13 (a), or (2) in the schedule confirmation dialogue with low priority as shown in FIG. 13 (b) When the priority of the parking lot search process is higher than the priority of the currently performed dialogue process, the state management unit 302 changes from the normal state 0100 shown in FIG. 9 to the destination approach state 0301 based on the notification signal. Transition state to. In addition, the dialogue processing execution unit 306 deletes the trigger condition having the ID 521 of “AAA” from the server trigger condition holding unit 310, and through dialogue processing, “has arrived near the destination. Will you guide the parking lot?” The generated composite character string is generated, and the generated composite character string is transmitted to the client apparatus 101 (S411).
 これにより、クライアント装置101は、「目的地近辺に到着しました、駐車場を案内いたしましょうか?」を出音する(S412)。なお、その際に、クライアント装置101は、目的地近辺であることを表す何らかの音を再生してもよいし、車内のディスプレイに何らかのキャラクター又は絵を表示してもよい。 As a result, the client apparatus 101 outputs a sound “Arrive near the destination, would you like to guide the parking lot?” (S412). At that time, the client device 101 may reproduce some sound indicating that it is near the destination, or may display some character or picture on a display in the vehicle.
 また、図13の(c)に示すように優先度が高いコンビニ検索処理の対話中などのように、通知信号に基づく駐車場検索処理の優先度の方が、現在行なわれている対話処理の優先度より低い場合は、駐車場検索処理は対話スタックに格納され、現在行われている対話処理が優先される。 Further, as shown in FIG. 13C, the priority of the parking lot search process based on the notification signal is higher than that of the interactive process currently being performed, such as during a conversation of a convenience store search process having a higher priority. If it is lower than the priority, the parking lot search process is stored in the dialog stack, and the currently performed dialog process is given priority.
 なお、上記では、目的地近辺で駐車場を案内する例を説明したが、ユーザ側から条件付指示を与えられたときにも本実施の形態を対応できる。図14A及び図14Bは、この動作例を説明するための図である。例えば、図14Aに示すように、ユーザが「あと10km走ったらガソリンを入れたい」と発話する(S501)。対話処理部108のマッチング処理部304は、音声認識結果の発話文字列に対してキーワードマッチングを行う(S502)。キーワードマッチングの結果に基づき、状態管理部302は、図9に示す通常状態0100から条件付指示状態0200に状態を遷移させる。 In addition, although the example which guides a parking lot near the destination was demonstrated above, this Embodiment can be coped with when a conditional instruction is given from the user side. 14A and 14B are diagrams for explaining this operation example. For example, as shown in FIG. 14A, the user speaks “I want to put gasoline when I run for another 10 km” (S501). The matching processing unit 304 of the dialogue processing unit 108 performs keyword matching on the uttered character string of the speech recognition result (S502). Based on the result of keyword matching, the state management unit 302 changes the state from the normal state 0100 shown in FIG. 9 to the conditional instruction state 0200.
 その際、トリガ条件が、車の走行距離に関する条件であること、あと10km走行することを条件とすることを示す情報(「走行距離」=10」、[条件]=「10km走行」)がキーワード保持部305に保持される。 At this time, information indicating that the trigger condition is a condition related to the travel distance of the car and that the travel condition is another 10 km (“travel distance” = 10 ”, [condition] =“ 10 km travel ”) is a keyword. It is held by the holding unit 305.
 次に、遷移後の状態に基づき、対話処理実行部306は、「ガソリンを入れたい」に対応する対話状態であるガソリンスタンド検索状態0400を取得する。その後、対話処理実行部306は、車のガソリンの状態からあと10km走行できることを確認したうえで、トリガリスト文字列テンプレート「<RULE>[ID],CAN,dist,[走行距離],[遷移対話状態],H</RULE>」を呼び出す。そして、対話処理実行部306は、[ID]には固有の値「BBB」を設定する。なお、ここでは、図15に示すようにガソリン残量に関わるトリガ条件(ID521=「BBB])が既に存在しており、かつこのトリガ条件を上書きされる場合を説明する。 Next, based on the state after the transition, the dialogue processing execution unit 306 acquires a gas station search state 0400 that is a dialogue state corresponding to “I want to put gasoline”. After that, the dialog processing execution unit 306 confirms that the vehicle can travel 10 km from the state of gasoline in the car, and then the trigger list character string template “<RULE> [ID], CAN, dist, [travel distance], [transition dialog”. Status], H </ RULE> ". Then, the dialogue processing execution unit 306 sets a unique value “BBB” in [ID]. Here, a case will be described in which a trigger condition (ID521 = “BBB]) relating to the remaining amount of gasoline already exists and this trigger condition is overwritten as shown in FIG.
 また、対話処理実行部306は、[走行距離]に「10」、[遷移対話状態]にガソリンスタンド検索状態を示す「0400」を設定する。以上により、トリガ情報「<RULE>BBB,CAN,dist,10,0400,H</RULE>」が生成される。また、対話処理実行部306は、この内容でトリガ条件を上書きする。これにより、サーバートリガ条件保持部310は、図16に示すように更新される。また、対話処理実行部306は、合成文字列「<VOICE>了解しました。あと10km走ったらお知らせします。</VOICE>」を生成し、生成されたトリガ情報及び合成文字列をクライアント装置101に送信する(S503)。 Also, the dialogue processing execution unit 306 sets “10” in the “travel distance” and “0400” indicating the gas station search state in the “transition dialogue state”. Thus, trigger information “<RULE> BBB, CAN, dist, 10, 0400, H </ RULE>” is generated. Further, the dialogue processing execution unit 306 overwrites the trigger condition with this content. As a result, the server trigger condition holding unit 310 is updated as shown in FIG. In addition, the dialogue processing execution unit 306 generates a composite character string “<VOICE> is accepted. We will notify you when you run another 10 km. </ VOICE>”, and the generated trigger information and composite character string are displayed on the client device 101. (S503).
 クライアント装置101は、合成文字列を受信し、「了解しました。あと10km走ったらお知らせします。」を出音する(S504)。 The client device 101 receives the composite character string and outputs “I understand. I will let you know when I run for another 10 km” (S504).
 また、クライアント装置101は、トリガ情報を受信し、受信したトリガ情報で示されるトリガ条件にクライアントトリガ条件保持部205に格納されているトリガ条件を上書きする。例えば、クライアントトリガ条件保持部205には図15に示すトリガ条件が保持されており、受信されたトリガ情報で示されるトリガ条件に、当該トリガ条件のIDと同じID「BBB」のトリガ条件が上書きされる。これにより、クライアントトリガ条件保持部205は、図16に示すように更新される。 Further, the client device 101 receives the trigger information, and overwrites the trigger condition stored in the client trigger condition holding unit 205 with the trigger condition indicated by the received trigger information. For example, the trigger condition shown in FIG. 15 is held in the client trigger condition holding unit 205, and the trigger condition having the same ID “BBB” as the ID of the trigger condition is overwritten on the trigger condition indicated by the received trigger information. Is done. As a result, the client trigger condition holding unit 205 is updated as shown in FIG.
 また、クライアント装置101は、更新されたトリガ条件に従い目標走行距離と現在の走行距離とを比較し、走行距離が目標走行距離に達したかを定期的に判定する。走行距離が目標走行距離に達した場合、通知部208は、通知信号「<STATE>BBB,0400,H</STATE>」を生成し、対話処理部108に送信する(S505)。 Also, the client device 101 compares the target travel distance with the current travel distance according to the updated trigger condition, and periodically determines whether the travel distance has reached the target travel distance. When the travel distance has reached the target travel distance, the notification unit 208 generates a notification signal “<STATE> BBB, 0400, H </ STATE>” and transmits it to the dialogue processing unit 108 (S505).
 対話処理部108は、通知信号を受信し、状態管理部302は、図9に示す通常状態0100からガソリンスタンド検索状態0400に状態を遷移させる。対話処理実行部306は、遷移後の状態に基づき、「10km走行しました。ガソリンスタンドを検索しますか?」を示す合成文字列を生成する(S506)。生成された合成文字列は、クライアント装置101に送信され、クライアント装置101は、「10km走行しました。ガソリンスタンドを検索しますか?」を出音する(S507)。 The dialogue processing unit 108 receives the notification signal, and the state management unit 302 changes the state from the normal state 0100 shown in FIG. 9 to the gas station search state 0400. Based on the state after the transition, the dialogue processing execution unit 306 generates a composite character string indicating “traveled 10 km. Do you want to search for a gas station?” (S506). The generated composite character string is transmitted to the client device 101, and the client device 101 emits a sound “Drived 10 km. Do you want to search for a gas station?” (S507).
 このように、10km走行後にガソリンスタンドを検索するプッシュ対話を開始できる。 In this way, you can start a push dialogue to search for a gas station after traveling 10 km.
 なお、上記説明では、対話処理部108のサーバートリガ条件保持部310と、クライアント装置101のクライアントトリガ条件保持部205との両方が、ID521、条件522、内容523、遷移先524及び優先度525をそれぞれ保持する例を述べたが、これらの情報の一部又は全てが一方の保持部にのみ保持されてもよい。 In the above description, both the server trigger condition holding unit 310 of the dialogue processing unit 108 and the client trigger condition holding unit 205 of the client apparatus 101 have the ID 521, the condition 522, the content 523, the transition destination 524, and the priority 525. Although the example of holding each has been described, part or all of the information may be held only in one holding unit.
 また、上記説明では、トリガ情報及び通知信号に、IDと、遷移先を示す情報と、優先度を示す情報とが含まれているが、これらの少なくとも一つが含まれなくてもよい。 In the above description, the trigger information and the notification signal include the ID, the information indicating the transition destination, and the information indicating the priority. However, at least one of these may not be included.
 例えば、遷移先524及び優先度525の少なくとも一方は、対話処理部108でのみ管理されてもよい。この場合、トリガ情報及び通知信号にはこの情報は含まれない。対話処理部108は、通知信号を受信した際に、通知信号に含まれるIDと、管理している情報とに基づき、遷移先及び優先度を判断する。 For example, at least one of the transition destination 524 and the priority 525 may be managed only by the dialogue processing unit 108. In this case, this information is not included in the trigger information and the notification signal. When receiving the notification signal, the dialogue processing unit 108 determines the transition destination and the priority based on the ID included in the notification signal and the managed information.
 または、遷移先524及び優先度525の少なくとも一方は、クライアント装置101でのみ管理されてもよい。この場合、トリガ情報及び通知信号にこの情報が含まれる。対話処理部108は、通知信号を受信した際に、通知信号に含まれる遷移先又は優先度に基づき、遷移先又は優先度を判断する。 Alternatively, at least one of the transition destination 524 and the priority 525 may be managed only by the client device 101. In this case, this information is included in the trigger information and the notification signal. When receiving the notification signal, the dialogue processing unit 108 determines the transition destination or priority based on the transition destination or priority included in the notification signal.
 また、通知信号に遷移先及び優先度が含まれる場合には、対話処理部108でトリガ条件を保持しておく必要はない。この場合、対話処理部108は、通知信号を受信した際に、通知信号に含まれる遷移先又は優先度に基づき、遷移先又は優先度を判断する。また、通知信号に遷移先が含まれる場合には、対話処理部108は、この遷移先のみに基づき処理を行うこともできるので、通知信号にIDが含まれなくてもよい。 Also, when the transition destination and priority are included in the notification signal, the dialog processing unit 108 does not need to hold the trigger condition. In this case, when receiving the notification signal, the dialogue processing unit 108 determines the transition destination or priority based on the transition destination or priority included in the notification signal. Further, when the transition destination is included in the notification signal, the dialogue processing unit 108 can perform processing based only on the transition destination, and thus the notification signal may not include the ID.
 つまり、トリガ情報は少なくともトリガ条件を示せばよい。また、通知信号はトリガ情報で示されるトリガ条件が満たされたことを示せばよい。 That is, the trigger information should at least indicate the trigger condition. The notification signal may indicate that the trigger condition indicated by the trigger information is satisfied.
 また、上記説明では、対話の内容をすぐにトリガ条件として活用したが、キーワード保持部305に過去の対話のキーワードが蓄積され、対話処理部108は、蓄積された情報に基づきトリガ条件を生成してもよい。例えば、対話処理部108は、初めて訪れる地域である旨の対話が行なわれていた場合に、ユーザがその地域に到着したことを判定する第1のトリガ条件が生成する。さらに、その第1のトリガ条件が満たされた場合に、観光地のスポット情報又は店舗のスポット広告に、ユーザが近付いたことを判定する第2のトリガ条件を生成する。この第2のトリガ条件が満たされた場合に、対話処理部108は、観光案内又は広告に関する対話を行う。このように、対話処理部108は、二段階のトリガ条件を生成してもよい。 In the above description, the content of the dialogue is immediately used as a trigger condition. However, keywords of past dialogue are accumulated in the keyword holding unit 305, and the dialogue processing unit 108 generates a trigger condition based on the accumulated information. May be. For example, when a dialog indicating that the area is the first visited area is performed, the dialog processing unit 108 generates a first trigger condition for determining that the user has arrived at the area. Further, when the first trigger condition is satisfied, a second trigger condition for determining that the user has approached the spot information of the sightseeing spot or the spot advertisement of the store is generated. When the second trigger condition is satisfied, the dialog processing unit 108 performs a dialog regarding sightseeing guide or advertisement. As described above, the dialogue processing unit 108 may generate a two-stage trigger condition.
 また、ユーザが車に詳しくない旨がいくつかの対話(運転暦又は事故対応の回数に関する対話など)で確認された場合に、対話処理部108は、車に異常が発生したことを判定するトリガ条件を生成する。対話処理部108は、この条件が満たされた場合に、どのような異常が発生していて、どう対処すべきかを示す情報を対話により与える。これにより、複数の対話の結果を利用して、初心者だけを対象としたヘルプ機能を実現できる。 In addition, when it is confirmed in some dialogs (such as a driving calendar or a dialog regarding the number of accident responses) that the user is not familiar with the car, the dialog processing unit 108 is a trigger for determining that an abnormality has occurred in the car. Generate a condition. When this condition is satisfied, the dialogue processing unit 108 gives information indicating what kind of abnormality has occurred and how to deal with it by dialogue. Thereby, the help function only for beginners is realizable using the result of a plurality of dialogs.
 さらに、お昼にイタリアンレストランが頻繁に検索されている場合に、対話処理部108は、お昼の時間帯になったかを判定するトリガ条件を生成し、条件が満たされた場合に、イタリアンレストランの店を対話で案内する。これにより、レコメンド機能を実現することも可能である。 Furthermore, when an Italian restaurant is frequently searched at noon, the dialogue processing unit 108 generates a trigger condition for determining whether it is a lunch hour, and if the condition is satisfied, the store of the Italian restaurant Is guided in a dialogue. Thereby, a recommendation function can also be realized.
 また、上記説明では、トリガ条件が、車両の走行距離等の条件である例を説明したが、トリガ条件は、車両の状態の条件であればよい。例えば、トリガ条件は、車に実装されているクライアント装置で取得可能なパラメータの条件であればよい。例えば、トリガ条件は、CAN(Controller Area Network)などから取得される車両情報(アクセル踏み込み量、ブレーキの踏み込み量、ステアリングの蛇角、シフト位置、ウィンカーの状態、ワイパーの状態、ライト状態、ガソリン残量、走行距離、車速、車体加速度、車体角速度、車間距離、近接センサ、水温、オイル量、各種警告、ウィンドウの開閉度、又はエアコンの設定など)の条件であってもよい。 In the above description, an example in which the trigger condition is a condition such as a travel distance of the vehicle has been described. However, the trigger condition may be a condition of the vehicle state. For example, the trigger condition may be a parameter condition that can be acquired by a client device installed in a vehicle. For example, the trigger conditions are vehicle information (accelerator depression amount, brake depression amount, steering depression angle, shift position, blinker state, wiper state, light state, gasoline state, etc. acquired from CAN (Controller Area Network). Conditions, such as amount, travel distance, vehicle speed, vehicle body acceleration, vehicle body angular velocity, inter-vehicle distance, proximity sensor, water temperature, oil amount, various warnings, window open / closed degree, or air conditioner setting).
 また、トリガ条件は、車に乗車しているユーザの状態の条件であってもよい。例えば、トリガ条件は、ドライバ周囲に配置されているセンサで得られるユーザの状態を示す情報(視線、顔の向き、音声、乗車人数、個人認識情報、体重、体温、脈拍、血圧、発汗、脳波、覚醒度又は集中度など)の条件であってもよい。 Further, the trigger condition may be a condition of the state of the user who is in the car. For example, the trigger condition is information indicating a user's state obtained by sensors arranged around the driver (line of sight, face orientation, voice, number of passengers, personal recognition information, body weight, body temperature, pulse, blood pressure, sweating, brain wave , Arousal level or concentration level).
 また、トリガ条件は、その他の車載のセンサで得られる情報(GPS情報、車内温度、外気温、湿度又は時刻など)などの条件であってもよい。 Also, the trigger condition may be a condition such as information (GPS information, in-vehicle temperature, outside air temperature, humidity, time, etc.) obtained by other in-vehicle sensors.
 つまり、トリガ条件は、車両又はクライアント装置が備えるセンサにより取得される車両又はユーザの状態の条件である。言い換えると、トリガ条件は、車両又はユーザの状態であって、逐次変化する状態の条件である。 That is, the trigger condition is a condition of a vehicle or a user state acquired by a sensor included in the vehicle or the client device. In other words, the trigger condition is a condition of a vehicle or a user that changes sequentially.
 さらに、上記説明では、クライアント装置101が自動車に搭載される例を述べたが、クライアント装置101は、移動体(例えば、電車、飛行機、又は自転車等)に搭載されてもよい。また、クライアント装置101は、これらの移動体に搭乗するユーザに所持又は携帯されてもよい。つまり、トリガ条件は、移動体の状態の条件、又は移動体に搭乗しているユーザの状態の条件であってもよい。 Furthermore, in the above description, an example in which the client device 101 is mounted on an automobile has been described. However, the client device 101 may be mounted on a moving body (for example, a train, an airplane, a bicycle, or the like). In addition, the client device 101 may be possessed or carried by a user who rides on these moving objects. That is, the trigger condition may be a condition of the state of the moving body or a condition of the state of the user who is on the moving body.
 また、トリガ条件はAND条件、OR条件又はその組み合わせであってもよい。例えば、<RULELIST><AND><RULE>[ID],GPS,[緯度],[経度],5,0301,H</RULE><RULE>[ID],TIME,[時刻],0301,H</RULE></AND></RULELIST>により、二つのルールのAND条件が定義されてもよい。これにより、位置の条件と時刻の条件とが共に満たされた場合にプッシュ発話が行われる。 Also, the trigger condition may be an AND condition, an OR condition, or a combination thereof. For example, <RULELIST> <AND> <RULE> [ID], GPS, [latitude], [longitude], 5,0301, H </ RULE> <RULE> [ID], TIME, [time], 0301, H </ RULE> </ AND> </ RULELIST> may define an AND condition of two rules. As a result, push utterance is performed when both the position condition and the time condition are satisfied.
 また、<RULELIST><OR><RULE>[ID],GPS,[緯度1],[経度1],5,0301,H</RULE><RULE>[ID],GPS,[緯度2],[経度2],5,0301,H</RULE></AND></RULELIST>により、二つのルールのOR条件が定義されてもよい。これにより、どちらかの条件が満たされた場合にプッシュ対話が開始される。 <RULELIST> <OR> <RULE> [ID], GPS, [latitude 1], [longitude 1], 5,0301, H </ RULE> <RULE> [ID], GPS, [latitude 2], The OR condition of the two rules may be defined by [longitude 2], 5, 0301, H </ RULE> </ AND> </ RULELIST>. Thereby, a push dialog is started when either condition is satisfied.
 これらを組み合わせることによって、例えば、ユーザが、オートクルーズコントロールを使ってみたい旨を対話で話した場合、対話処理部108は、車両が高速道路付近でかつ高速定常走行している場合を検知し、オートクルーズコントロールの機能を音声で案内することも可能である。また、対話で目的地が設定されており、その目的地が遠い場合、対話処理部108は、ユーザの覚醒度の低下又は集中度の低下のいずれかを検知し、ユーザに休憩を勧めたりすることも可能である。 By combining these, for example, when the user speaks in a dialog that he / she wants to use the auto cruise control, the dialogue processing unit 108 detects the case where the vehicle is traveling near the highway and running at a high speed steady state. It is also possible to guide the auto cruise control function by voice. In addition, when the destination is set in the dialogue and the destination is far, the dialogue processing unit 108 detects either a decrease in the user's arousal level or a decrease in the concentration level, and recommends the user to take a break. It is also possible.
 また、対話処理部108は、(1)車内の人同士の会話をモニタリングし、会話が途切れたとき、(2)運転負荷をモニタリングし、運転負荷が低くなったとき、(3)ユーザの脳波又は心拍などからユーザの精神状態を検知し、ユーザの精神状態が安静状態のときに、プッシュ型の発話を行ってもよい。 In addition, the dialogue processing unit 108 monitors (1) conversation between people in the vehicle, when the conversation is interrupted, (2) monitors driving load, and when the driving load decreases, (3) user's brain wave Alternatively, the user's mental state may be detected from a heartbeat or the like, and push-type speech may be performed when the user's mental state is at rest.
 さらに、本実施の形態では、各機能をサーバ120とクライアントシステム110とに分けて説明したが、サーバ120の機能の一部をクライアントシステム110が有してもよい。 Furthermore, in the present embodiment, each function is described separately for the server 120 and the client system 110, but the client system 110 may have a part of the functions of the server 120.
 また、本実施の形態では、音声認識部107と対話処理部108と音声合成部109とが直接接続されていない例を述べたが、音声認識部107の出力信号が対話処理部108に直接入力されてもよいし、対話処理部108の出力信号のうち、合成文字列が音声合成部109に直接入力され、トリガ情報がクライアント装置101に送信されてもよい。 In the present embodiment, an example is described in which the speech recognition unit 107, the dialogue processing unit 108, and the speech synthesis unit 109 are not directly connected. However, the output signal of the speech recognition unit 107 is directly input to the dialogue processing unit 108. Of the output signals from the dialogue processing unit 108, a synthesized character string may be directly input to the speech synthesis unit 109 and trigger information may be transmitted to the client device 101.
 さらに、上記説明では、トリガ条件はその条件が満たされた場合に消去されていたが、トリガ条件に消去条件が付与され、消去条件が満たされた場合に、クライアント装置101のクライアントトリガ条件保持部205からトリガ条件が消去されてもよい。この場合、クライアント装置101はサーバ120に消去指令を送信し、消去命令を受信したサーバ120は、サーバートリガ条件保持部310のトリガ条件を削除する。 Furthermore, in the above description, the trigger condition is erased when the condition is satisfied. However, when the erase condition is given to the trigger condition and the erase condition is satisfied, the client trigger condition holding unit of the client device 101 is satisfied. The trigger condition may be deleted from 205. In this case, the client apparatus 101 transmits an erasure command to the server 120, and the server 120 that has received the erasure command deletes the trigger condition of the server trigger condition holding unit 310.
 以上のように、本実施の形態に係る音声対話システム100は、移動体に搭乗しているユーザからの音声に基づき、ユーザと対話する音声対話システムであって、ユーザからの音声を取得し、ユーザに音声情報を提供する情報端末装置であるクライアント装置101と、サーバ120とを含む。 As described above, the voice dialogue system 100 according to the present embodiment is a voice dialogue system that interacts with a user based on voice from a user who is on a moving body, and obtains voice from the user, A client device 101 that is an information terminal device that provides voice information to a user and a server 120 are included.
 サーバ120は、クライアント装置101に、移動体又はユーザの状態の条件を示すトリガ情報を送信し、当該条件を満たすときに通知信号をサーバ120に送信することを依頼する(図6のS307)。サーバ120は、通知信号を受信した場合(S308でYes)、上記条件が満たされた場合の音声対話用処理を行うことで音声情報(合成文字列又は音声合成信号)を生成する(S309)。サーバ120は、音声情報をクライアント装置101に送信する(S310)。 The server 120 transmits to the client apparatus 101 trigger information indicating the condition of the mobile object or the user, and requests that the notification signal be transmitted to the server 120 when the condition is satisfied (S307 in FIG. 6). When the server 120 receives the notification signal (Yes in S308), the server 120 generates speech information (synthetic character string or speech synthesis signal) by performing the voice interaction process when the above condition is satisfied (S309). The server 120 transmits audio information to the client device 101 (S310).
 また、クライアント装置101は、サーバ120からトリガ情報を受信し(図5のS205)、トリガ情報で示される条件が満たされたか否かを判定する(S206及びS208)。クライアント装置101は、トリガ情報で示される条件が満たされた場合に(S208でYes)、サーバ120に通知信号を送信する(S209)。また、クライアント装置101は、サーバ120から音声情報(合成文字列又は音声合成信号)を受信し(S203)、受信された音声情報をユーザに提供する(S204)。 Further, the client apparatus 101 receives the trigger information from the server 120 (S205 in FIG. 5), and determines whether or not the condition indicated by the trigger information is satisfied (S206 and S208). When the condition indicated by the trigger information is satisfied (Yes in S208), the client apparatus 101 transmits a notification signal to the server 120 (S209). The client apparatus 101 receives voice information (synthetic character string or voice synthesis signal) from the server 120 (S203), and provides the received voice information to the user (S204).
 例えば、トリガ情報で示される条件は、移動体又はクライアント装置101が備えるセンサにより取得される移動体又はユーザの状態の条件を含む。例えば、この条件は、移動体の位置の条件又は移動体の移動距離の条件を含む。 For example, the condition indicated by the trigger information includes a condition of the moving object or the user acquired by a sensor included in the moving object or the client device 101. For example, this condition includes a condition of the position of the moving object or a condition of the moving distance of the moving object.
 このように、サーバ・クライアント型の音声対話システムにおいて、サーバ120で生成したトリガ条件をクライアント装置101で判断することにより、クライアント装置101を非常にシンプルで安価な構造で実現できる。また、サーバ120がプッシュ型対話の条件を保持できるためメンテナンス性及び拡張性を十分に確保できる。かつ、従来のシステムでは実現の難しかった、ユーザから「~になったら、~して」といった条件付指示を受けた場合についても、同様の仕組みで実現できる。 As described above, in the server-client type voice interaction system, the client device 101 can be realized with a very simple and inexpensive structure by determining the trigger condition generated by the server 120 by the client device 101. In addition, since the server 120 can maintain the conditions for the push-type conversation, maintenance and expandability can be sufficiently secured. In addition, even when a conditional instruction such as “when it comes to” is received from the user, which is difficult to realize with the conventional system, it can be realized with the same mechanism.
 また、通知信号は、当該通知信号に基づく音声対話用処理の優先度を示す情報を含む。サーバ120は、優先度に基づき、現在実行中の音声対話用処理と、通知信号に基づく音声対話用処理とのうち、優先度の高い音声対話用処理を実行する。 Also, the notification signal includes information indicating the priority of the voice conversation processing based on the notification signal. Based on the priority, the server 120 executes a voice conversation process with a higher priority among the voice conversation process currently being executed and the voice conversation process based on the notification signal.
 これによれば、優先度に応じた適切な対話処理を実現できる。また、サーバは、通知信号に含まれる優先度を用いて直ちに処理の優先度を把握できる。 According to this, it is possible to realize appropriate dialogue processing according to priority. Further, the server can immediately grasp the processing priority using the priority included in the notification signal.
 また、通知信号は、当該通知信号に基づく音声対話用処理における状態遷移の遷移先を示す情報を含む。サーバ120は、上記情報で示される遷移先に状態を遷移させ、遷移後の状態に基づき音声情報を生成する。 In addition, the notification signal includes information indicating the transition destination of the state transition in the voice dialogue processing based on the notification signal. The server 120 changes the state to the transition destination indicated by the information, and generates voice information based on the state after the transition.
 これによれば、サーバは、通知信号に含まれる遷移先の情報を用いて直ちに対話処理を実行できる。 According to this, the server can immediately execute the dialogue process using the information of the transition destination included in the notification signal.
 以上、本開示の実施の形態に係る音声対話システムについて説明したが、本開示は、この実施の形態に限定されるものではない。 As described above, the voice interaction system according to the embodiment of the present disclosure has been described. However, the present disclosure is not limited to this embodiment.
 例えば、本開示は、上記音声対話システムに限らず、音声対話システムに含まれるサーバ、又はクライアント装置(情報端末装置)として実現されてもよいし、音声対話システム、サーバ又はクライアント装置における情報提供方法として実現されてもよい。 For example, the present disclosure is not limited to the voice interaction system, but may be realized as a server or a client device (information terminal device) included in the voice interaction system, or an information providing method in the voice interaction system, the server, or the client device. It may be realized as.
 また、上記実施の形態に係る音声対話システムに含まれる処理部の一部又は全ては典型的には集積回路であるLSIとして実現される。これらは個別に1チップ化されてもよいし、一部又は全てを含むように1チップ化されてもよい。 In addition, a part or all of the processing units included in the voice interaction system according to the above embodiment is typically realized as an LSI that is an integrated circuit. These may be individually made into one chip, or may be made into one chip so as to include a part or all of them.
 また、集積回路化はLSIに限るものではなく、専用回路又は汎用プロセッサで実現してもよい。LSI製造後にプログラムすることが可能なFPGA(Field Programmable Gate Array)、又はLSI内部の回路セルの接続や設定を再構成可能なリコンフィギュラブル・プロセッサを利用してもよい。 Further, the integration of circuits is not limited to LSI, and may be realized by a dedicated circuit or a general-purpose processor. An FPGA (Field Programmable Gate Array) that can be programmed after manufacturing the LSI or a reconfigurable processor that can reconfigure the connection and setting of circuit cells inside the LSI may be used.
 また、上記実施の形態において、各構成要素は、専用のハードウェアで構成されるか、各構成要素に適したソフトウェアプログラムを実行することによって実現されてもよい。各構成要素は、CPUまたはプロセッサなどのプログラム実行部が、ハードディスクまたは半導体メモリなどの記録媒体に記録されたソフトウェアプログラムを読み出して実行することによって実現されてもよい。 In the above embodiment, each component may be configured by dedicated hardware or may be realized by executing a software program suitable for each component. Each component may be realized by a program execution unit such as a CPU or a processor reading and executing a software program recorded on a recording medium such as a hard disk or a semiconductor memory.
 さらに、本開示は上記プログラムであってもよいし、上記プログラムが記録された非一時的なコンピュータ読み取り可能な記録媒体であってもよい。また、上記プログラムは、インターネット等の伝送媒体を介して流通させることができるのは言うまでもない。 Further, the present disclosure may be the above-described program, or a non-transitory computer-readable recording medium on which the above-described program is recorded. Needless to say, the program can be distributed via a transmission medium such as the Internet.
 また、上記で用いた対話内容又は数字等は、全て本開示を具体的に説明するために例示するものであり、本開示は例示された対話内容又は数字等に制限されない。 In addition, the dialogue contents, numbers, and the like used above are all exemplified for specifically explaining the present disclosure, and the present disclosure is not limited to the exemplified dialogue contents, numbers, and the like.
 また、ブロック図における機能ブロックの分割は一例であり、複数の機能ブロックを一つの機能ブロックとして実現したり、一つの機能ブロックを複数に分割したり、一部の機能を他の機能ブロックに移してもよい。また、類似する機能を有する複数の機能ブロックの機能を単一のハードウェア又はソフトウェアが並列又は時分割に処理してもよい。 In addition, division of functional blocks in the block diagram is an example, and a plurality of functional blocks can be realized as one functional block, a single functional block can be divided into a plurality of functions, or some functions can be transferred to other functional blocks. May be. In addition, functions of a plurality of functional blocks having similar functions may be processed in parallel or time-division by a single hardware or software.
 同様に、上記の音声対話システムにおける情報提供方法は、本開示を具体的に説明するために例示するためのものであり、本開示に係る情報提供方法は、上記ステップの全てを必ずしも含む必要はない。また、上記のステップが実行される順序は、本開示を具体的に説明するために例示するためのものであり、上記以外の順序であってもよい。また、上記ステップの一部が、他のステップと同時(並列)あるいは、処理時間が重複して実行されてもよい。 Similarly, the information providing method in the above spoken dialogue system is for illustration in order to specifically explain the present disclosure, and the information providing method according to the present disclosure does not necessarily include all the above steps. Absent. In addition, the order in which the above steps are executed is for illustration in order to specifically describe the present disclosure, and may be in an order other than the above. Further, a part of the above steps may be executed simultaneously with other steps (in parallel) or with overlapping processing times.
 以上、一つまたは複数の態様に係る音声対話システムについて、実施の形態に基づいて説明したが、本開示は、この実施の形態に限定されるものではない。本開示の趣旨を逸脱しない限り、当業者が思いつく各種変形を本実施の形態に施したものや、異なる実施の形態における構成要素を組み合わせて構築される形態も、一つまたは複数の態様の範囲内に含まれてもよい。 As described above, the voice interaction system according to one or more aspects has been described based on the embodiment. However, the present disclosure is not limited to this embodiment. Unless it deviates from the gist of the present disclosure, various modifications conceived by those skilled in the art have been made in this embodiment, and forms constructed by combining components in different embodiments are also within the scope of one or more aspects. May be included.
 本開示は、音声対話システムに適用でき、例えば、車に搭乗しているユーザと音声対話を行なうシステムに有用である。 The present disclosure can be applied to a voice interaction system, and is useful, for example, for a system that performs a voice interaction with a user in a car.
 100 音声対話システム
 101 クライアント装置
 102 マイク
 103 スピーカ
 104 車両情報取得部
 107 音声認識部
 108 対話処理部
 109 音声合成部
 110 クライアントシステム
 120 サーバ
 201 音声取得部
 202 音声出力部
 203 通信部
 204 トリガ情報解釈部
 205 クライアントトリガ条件保持部
 206 車両状態管理部
 207 判定部
 208 通知部
 301 入力文字列管理部
 302 状態管理部
 303 キーワードDB
 304 マッチング処理部
 305 キーワード保持部
 306 対話処理実行部
 307 合成文字列テンプレートDB
 308 トリガテンプレートDB
 309 出力文字列管理部
 310 サーバートリガ条件保持部
 401,402,403,404,405 テーブル
 406,407,408,409 リスト
 410 キーワード
 420 参照キーワードグループ
 501,511 状態
 502,512,522 条件
 503,513 テンプレート
 521 ID
 523 内容
 524 遷移先
 525 優先度
DESCRIPTION OF SYMBOLS 100 Voice interactive system 101 Client apparatus 102 Microphone 103 Speaker 104 Vehicle information acquisition part 107 Voice recognition part 108 Dialog processing part 109 Voice synthesis part 110 Client system 120 Server 201 Voice acquisition part 202 Voice output part 203 Communication part 204 Trigger information interpretation part 205 Client trigger condition holding unit 206 Vehicle state management unit 207 Determination unit 208 Notification unit 301 Input character string management unit 302 State management unit 303 Keyword DB
304 matching processing unit 305 keyword holding unit 306 dialogue processing execution unit 307 composite character string template DB
308 Trigger template DB
309 Output string management unit 310 Server trigger condition holding unit 401, 402, 403, 404, 405 Table 406, 407, 408, 409 List 410 Keyword 420 Reference keyword group 501, 511 State 502, 512, 522 Condition 503, 513 Template 521 ID
523 Contents 524 Transition destination 525 Priority

Claims (28)

  1.  移動体に搭乗しているユーザからの音声に基づき、前記ユーザと対話する音声対話システムに含まれるサーバにおける情報提供方法であって、
     前記音声対話システムに含まれ、前記音声を取得し、前記ユーザに音声情報を提供する情報端末装置に、前記移動体又は前記ユーザの状態の条件を示すトリガ情報を送信し、当該条件を満たすときに通知信号を送信することを依頼するトリガ情報送信ステップと、
     前記通知信号を受信した場合、前記条件が満たされた場合の音声対話用処理を行うことで前記音声情報を生成する生成ステップと、
     前記音声情報を前記情報端末装置に送信する音声情報送信ステップとを含む
     情報提供方法。
    An information providing method in a server included in a voice dialogue system that interacts with the user based on voice from a user boarding a moving body,
    When trigger information indicating a condition of the state of the mobile body or the user is transmitted to an information terminal device included in the voice interaction system, which acquires the voice and provides voice information to the user, and satisfies the condition Trigger information sending step for requesting to send a notification signal to,
    When the notification signal is received, a generation step of generating the voice information by performing a voice interaction process when the condition is satisfied;
    A voice information transmitting step of transmitting the voice information to the information terminal device.
  2.  前記条件は、前記移動体又は前記情報端末装置が備えるセンサにより取得される前記移動体又は前記ユーザの状態の条件を含む
     請求項1記載の情報提供方法。
    The information providing method according to claim 1, wherein the condition includes a condition of a state of the moving object or the user acquired by a sensor included in the moving object or the information terminal device.
  3.  前記条件は、前記移動体の位置の条件を含む
     請求項2記載の情報提供方法。
    The information providing method according to claim 2, wherein the condition includes a condition of a position of the moving body.
  4.  前記条件は、前記移動体の移動距離の条件を含む
     請求項2記載の情報提供方法。
    The information providing method according to claim 2, wherein the condition includes a condition of a moving distance of the moving body.
  5.  前記通知信号は、当該通知信号に基づく音声対話用処理の優先度を示す情報を含み、
     前記生成ステップでは、前記優先度に基づき、現在実行中の音声対話用処理と、前記通知信号に基づく音声対話用処理とのうち、優先度の高い音声対話用処理を実行する
     請求項1~4のいずれか1項に記載の情報提供方法。
    The notification signal includes information indicating the priority of the voice interaction processing based on the notification signal,
    5. In the generation step, based on the priority, a voice conversation process having a higher priority is executed among a voice conversation process currently being executed and a voice conversation process based on the notification signal. The information providing method according to any one of the above.
  6.  前記通知信号は、当該通知信号に基づく音声対話用処理における状態遷移の遷移先を示す情報を含み、
     前記生成ステップでは、前記遷移先に状態を遷移させ、遷移後の状態に基づき前記音声情報を生成する
     請求項1~4のいずれか1項に記載の情報提供方法。
    The notification signal includes information indicating a transition destination of state transition in the voice interaction processing based on the notification signal,
    5. The information providing method according to claim 1, wherein in the generation step, the state is changed to the transition destination, and the voice information is generated based on the state after the transition.
  7.  移動体に搭乗しているユーザからの音声に基づき、前記ユーザと対話する音声対話システムに含まれ、前記音声を取得する情報端末装置における情報提供方法であって、
     前記音声対話システムに含まれるサーバから、前記移動体又は前記ユーザの状態の条件を示すトリガ情報を受信するトリガ情報受信ステップと、
     前記条件が満たされたか否かを判定する判定ステップと、
     前記条件が満たされた場合に、前記サーバに通知信号を送信する送信ステップと、
     前記サーバから、前記条件が満たされた場合の音声対話用処理により生成された音声情報を受信する音声情報受信ステップと、
     前記音声情報を前記ユーザに提供する提供ステップとを含む
     情報提供方法。
    An information providing method in an information terminal device, which is included in a voice dialogue system that interacts with the user based on voice from a user who is on a moving body, and obtains the voice,
    A trigger information receiving step of receiving trigger information indicating a condition of the state of the mobile object or the user from a server included in the voice interaction system;
    A determination step of determining whether or not the condition is satisfied;
    A transmission step of transmitting a notification signal to the server when the condition is satisfied;
    A voice information receiving step for receiving voice information generated by the voice interaction process when the condition is satisfied from the server;
    Providing the voice information to the user. An information providing method.
  8.  移動体に搭乗しているユーザからの音声に基づき、前記ユーザと対話する音声対話システムにおける情報提供方法であって、
     前記音声対話システムは、
     前記音声を取得する情報端末装置と、
     サーバとを含み、
     前記情報提供方法は、
     前記サーバが、前記情報端末装置に、前記移動体又は前記ユーザの状態の条件を示すトリガ情報を送信するトリガ情報送信ステップと、
     前記情報端末装置が、前記トリガ情報を受信した場合、前記条件が満たされたか否かを判定する判定ステップと、
     前記情報端末装置が、前記条件が満たされた場合に、前記サーバに通知信号を送信する送信ステップと、
     前記サーバが、前記通知信号を受信した場合、前記条件が満たされた場合の音声対話用処理を行うことで音声情報を生成する生成ステップと、
     前記サーバが、前記音声情報を前記情報端末装置に送信する音声情報送信ステップと、
     前記情報端末装置が、前記音声情報を受信し、前記音声情報を前記ユーザに提供する提供ステップとを含む
     情報提供方法。
    An information providing method in a voice interaction system for interacting with a user based on a voice from a user boarding a moving body,
    The spoken dialogue system includes:
    An information terminal device for acquiring the voice;
    Server and
    The information providing method includes:
    A trigger information transmission step in which the server transmits trigger information indicating a condition of the state of the mobile object or the user to the information terminal device;
    A determination step of determining whether or not the condition is satisfied when the information terminal device receives the trigger information; and
    When the information terminal device satisfies the condition, a transmission step of transmitting a notification signal to the server;
    When the server receives the notification signal, a generation step of generating voice information by performing a voice interaction process when the condition is satisfied;
    A voice information sending step in which the server sends the voice information to the information terminal device;
    A providing step in which the information terminal device receives the voice information and provides the voice information to the user.
  9.  移動体に搭乗しているユーザからの音声に基づき、前記ユーザと対話する音声対話システムに含まれるサーバであって、
     前記音声対話システムに含まれ、前記音声を取得し、前記ユーザに音声情報を提供する情報端末装置に、前記移動体又は前記ユーザの状態の条件を示すトリガ情報を送信し、当該条件を満たすときに通知信号を送信することを依頼するトリガ情報送信部と、
     前記通知信号を受信した場合、前記条件が満たされた場合の音声対話用処理を行うことで前記音声情報を生成する生成部と、
     前記音声情報を前記情報端末装置に送信する音声情報送信部とを備える
     サーバ。
    A server included in a voice interaction system that interacts with a user based on a voice from a user boarding a moving body,
    When trigger information indicating a condition of the state of the mobile body or the user is transmitted to an information terminal device included in the voice interaction system, which acquires the voice and provides voice information to the user, and satisfies the condition A trigger information transmitter for requesting to transmit a notification signal to
    When the notification signal is received, a generation unit that generates the voice information by performing a voice interaction process when the condition is satisfied;
    A voice information transmitting unit configured to transmit the voice information to the information terminal device;
  10.  移動体に搭乗しているユーザからの音声に基づき、前記ユーザと対話する音声対話システムに含まれ、前記音声を取得する情報端末装置であって、
     前記音声対話システムに含まれるサーバから、前記移動体又は前記ユーザの状態の条件を示すトリガ情報を受信するトリガ情報受信部と、
     前記条件が満たされたか否かを判定する判定部と、
     前記条件が満たされた場合に、前記サーバに通知信号を送信する送信部と、
     前記サーバから、前記条件が満たされた場合の音声対話用処理により生成された音声情報を受信する音声情報受信部と、
     前記音声情報を前記ユーザに提供する提供部と備える
     情報端末装置。
    An information terminal device that is included in a voice interaction system that interacts with the user based on a voice from a user boarding a moving body, and that acquires the voice,
    A trigger information receiving unit for receiving trigger information indicating a condition of the state of the mobile object or the user from a server included in the voice interaction system;
    A determination unit for determining whether or not the condition is satisfied;
    A transmission unit that transmits a notification signal to the server when the condition is satisfied;
    A voice information receiving unit that receives voice information generated by the voice interaction process when the condition is satisfied from the server;
    An information terminal device comprising: a providing unit that provides the voice information to the user.
  11.  移動体に搭乗しているユーザからの音声に基づき、前記ユーザと対話する音声対話システムであって、
     前記音声を取得し、前記ユーザに音声情報を提供する情報端末装置と、
     サーバとを含み、
     前記サーバは、
     前記情報端末装置に、前記移動体又は前記ユーザの状態の条件を示すトリガ情報を送信し、当該条件を満たすときに通知信号を送信することを依頼するトリガ情報送信部と、
     前記通知信号を受信した場合、前記条件が満たされた場合の音声対話用処理を行うことで前記音声情報を生成する生成部と、
     前記音声情報を前記情報端末装置に送信する音声情報送信部とを備え、
     前記情報端末装置は、
     前記サーバから前記トリガ情報を受信するトリガ情報受信部と、
     前記条件が満たされたか否かを判定する判定部と、
     前記条件が満たされた場合に、前記サーバに通知信号を送信する送信部と、
     前記サーバから前記音声情報を受信する音声情報受信部と、
     前記音声情報を前記ユーザに提供する提供部と備える
     音声対話システム。
    A voice interaction system for interacting with a user based on a voice from a user boarding a moving body,
    An information terminal device that acquires the voice and provides the user with voice information;
    Server and
    The server
    A trigger information transmitter that transmits trigger information indicating a condition of the state of the mobile object or the user to the information terminal device, and requests transmission of a notification signal when the condition is satisfied;
    When the notification signal is received, a generation unit that generates the voice information by performing a voice interaction process when the condition is satisfied;
    A voice information transmitter that transmits the voice information to the information terminal device;
    The information terminal device
    A trigger information receiving unit for receiving the trigger information from the server;
    A determination unit for determining whether or not the condition is satisfied;
    A transmission unit that transmits a notification signal to the server when the condition is satisfied;
    A voice information receiving unit for receiving the voice information from the server;
    A voice interaction system comprising a providing unit for providing the voice information to the user.
  12.  情報端末装置とサーバ間で情報を通信するシステムにおける情報提供方法であって、
     前記情報提供方法は、
     前記情報端末装置が、情報を取得するステップと、
    前記情報端末装置が、情報を前記サーバへ送信する情報送信ステップと、
     前記サーバが、前記情報から現在の状態に基づいて条件を生成する条件生成ステップと、
     前記サーバが、前記条件を示すトリガ情報を前記情報端末装置へ送信するトリガ情報送信ステップと、
     前記情報端末装置が、前記トリガ情報を受信した場合、前記条件が満たされたか否かを判定する判定ステップと、
     前記情報端末装置が、前記条件が満たされた場合に、通知信号を前記サーバへ返信する通知信号返信ステップと、
     前記サーバが、前記通知信号を受信した場合、前記条件が満たされた場合の処理を行うことで文字列情報を生成する文字列情報生成ステップと、
     前記サーバが、前記文字列情報報を前記情報端末装置に送信する文字列情報送信ステップと、
     前記情報端末装置が、前記文字列情報を受信し、前記文字列情報を出力する情報出力ステップとを含む
     情報提供方法。
    An information providing method in a system for communicating information between an information terminal device and a server,
    The information providing method includes:
    The information terminal device acquiring information;
    An information transmitting step in which the information terminal device transmits information to the server;
    A condition generating step in which the server generates a condition from the information based on a current state;
    A trigger information transmission step in which the server transmits trigger information indicating the condition to the information terminal device;
    A determination step of determining whether or not the condition is satisfied when the information terminal device receives the trigger information; and
    A notification signal return step of returning a notification signal to the server when the information terminal device satisfies the condition;
    When the server receives the notification signal, a character string information generating step for generating character string information by performing processing when the condition is satisfied;
    A character string information transmitting step in which the server transmits the character string information report to the information terminal device;
    An information providing method comprising: an information output step in which the information terminal device receives the character string information and outputs the character string information.
  13.  情報端末装置とサーバ間で情報を通信するシステムに含まれるサーバにおける情報提供方法であって、
     前記情報端末装置で取得され、送信された情報から現在の状態に基づいて条件を生成する条件生成ステップと、
     前記条件を示すトリガ情報を前記情報端末装置へ送信するトリガ情報送信ステップと、
     当該条件が満たされた場合に前記情報端末装置から通知信号を受信する通知信号受信ステップと、
     前記通知信号を受信した場合、前記条件が満たされた場合の処理を行うことで文字列情報を生成する文字列情報生成ステップと、
     前記文字列情報を前記情報端末装置に送信する文字列情報送信ステップとを含む情報提供方法。
    An information providing method in a server included in a system for communicating information between an information terminal device and a server,
    A condition generation step of generating a condition based on the current state from the information acquired and transmitted by the information terminal device;
    Trigger information transmission step of transmitting trigger information indicating the condition to the information terminal device;
    A notification signal receiving step of receiving a notification signal from the information terminal device when the condition is satisfied;
    When the notification signal is received, a character string information generation step for generating character string information by performing processing when the condition is satisfied;
    A character string information transmitting step of transmitting the character string information to the information terminal device.
  14.  情報端末装置とサーバ間で情報を通信するシステムに含まれる前記情報端末装置における情報提供方法であって、
     前記情報を取得するステップと、
     前記システムに含まれるサーバに情報を送信するステップと、
     前記サーバから、前記情報から生成された条件を示すトリガ情報を受信するトリガ情報受信ステップと、
     前記条件が満たされたか否かを判定する判定ステップと、
     前記条件が満たされた場合に、前記サーバに通知信号を返信する通知信号返信ステップと、
     前記サーバから、前記条件が満たされた場合の処理により生成された文字列情報を受信し、前記文字列情報を出力する情報出力ステップとを含む
     情報提供方法。
    An information providing method in the information terminal device included in a system for communicating information between an information terminal device and a server,
    Obtaining the information;
    Sending information to a server included in the system;
    A trigger information receiving step for receiving trigger information indicating a condition generated from the information from the server;
    A determination step of determining whether or not the condition is satisfied;
    A notification signal return step of returning a notification signal to the server when the condition is satisfied;
    An information providing method including: an information output step of receiving character string information generated by processing when the condition is satisfied from the server and outputting the character string information.
  15.  前記情報端末装置から提供された情報は、音声情報である
    請求項12から14のいずれか一つに記載の情報提供方法。
    The information providing method according to claim 12, wherein the information provided from the information terminal device is audio information.
  16.  前記音声情報は、前記情報端末装置に入力された音声から生成された
    請求項15に記載の情報提供方法。
    The information providing method according to claim 15, wherein the voice information is generated from voice input to the information terminal device.
  17.  前記情報端末装置から提供された情報は、前記情報端末装置が受信したセンサーの情報である
    請求項12から14のいずれか一つに記載の情報提供方法。
    The information providing method according to claim 12, wherein the information provided from the information terminal device is sensor information received by the information terminal device.
  18.  前記情報端末装置から提供された情報は、前記情報端末装置に接続されたセンサーの情報である
    請求項12から14のいずれか一つに記載の情報提供方法。
    The information providing method according to claim 12, wherein the information provided from the information terminal device is information on a sensor connected to the information terminal device.
  19.  前記情報端末装置が受信したセンサーの情報は、情報端末装置を搭載する移動体に設置されたセンサーあるいは、情報端末装置の周辺に位置するセンサーの情報である
    請求項17に記載の情報提供方法。
    The information providing method according to claim 17, wherein the information on the sensor received by the information terminal device is information on a sensor installed in a mobile body on which the information terminal device is mounted or a sensor located around the information terminal device.
  20.  前記文字列情報は、音声情報である
    請求項12から19のいずれか一つに記載の情報提供方法。
    The information providing method according to claim 12, wherein the character string information is voice information.
  21.  前記条件は、前記移動体の位置の条件を含む
     請求項19記載の情報提供方法。
    The information providing method according to claim 19, wherein the condition includes a condition of a position of the moving body.
  22.  前記条件は、前記移動体の移動距離の条件を含む
     請求項19記載の情報提供方法。
    The information providing method according to claim 19, wherein the condition includes a condition of a moving distance of the moving body.
  23.  前記条件生成ステップでは、前記文字列情報は前記音声情報の音声対話用処理により生成され、
    前記通知信号は、当該通知信号に基づく音声対話用処理の優先度を示す情報を含み、
     前記優先度に基づき、現在実行中の音声対話用処理と、前記通知信号に基づく音声対話用処理とのうち、優先度の高い音声対話用処理を実行する
     請求項15から16のいずれか一つに記載の情報提供方法。
    In the condition generation step, the character string information is generated by a voice interaction process of the voice information,
    The notification signal includes information indicating the priority of the voice interaction processing based on the notification signal,
    The voice conversation process with a high priority is executed based on the priority among the voice conversation process currently being executed and the voice conversation process based on the notification signal. Information providing method described in 1.
  24.  前記通知信号は、当該通知信号に基づく処理における状態遷移の遷移先を示す情報を含み、
     前記文字列情報生成ステップでは、前記遷移先に状態を遷移させ、遷移後の状態に基づき前記文字列情報を生成する
     請求項12から23のいずれか1項に記載の情報提供方法。
    The notification signal includes information indicating a transition destination of state transition in processing based on the notification signal,
    The information providing method according to any one of claims 12 to 23, wherein in the character string information generation step, a state is changed to the transition destination, and the character string information is generated based on a state after the transition.
  25.  情報端末装置とサーバ間で情報を通信するシステムであって、
     前記サーバは、
      情報受信部と、
      条件生成部と、
      トリガ情報送信部と、
      通知信号受信部と、
      文字列情報生成部と、
      文字列情報送信部とを備え、
     前記情報端末装置は、
      情報取得部と、
      情報送信部と、
      トリガ情報受信部と、
      判定部と、
      文字列情報受信部と、
      出力部とを備え、
     前記情報送信部は、前記情報受信部に情報を送信し、
     前記情報受信部は、前記情報を受信し、
     前記条件生成部は、前記情報から現在の状態に基づいて条件を生成し、
     前記トリガ情報送信部は、前記条件を示すトリガ情報を前記トリガ情報受信部に送信し、
     前記トリガ情報受信部は、前記サーバから、前記情報から生成された条件を示すトリガ情報を受信し、
     前記判定部は、前記条件が満たされたか否かを判定し、
     前記通知信号送信部は、前記条件が満たされた場合に、前記通知信号受信部に通知信号を返信し、
     前記通知信号受信部は、当該条件が満たされた場合に、前記通知信号送信部から通知信号を受信し、
     前記文字列情報生成部は、前記通知信号を受信した場合に、前記通知信号に基づき文字列情報を生成し、
     前記文字列情報送信部は、前記文字列情報を前記文字列情報受信部に送信し、
     前記文字列情報受信部は、前記文字列情報を受信し、
     前記出力部は、前記文字列情報を出力する
    システム。
    A system for communicating information between an information terminal device and a server,
    The server
    An information receiver;
    A condition generator;
    A trigger information transmitter;
    A notification signal receiver;
    A string information generator,
    A character string information transmission unit,
    The information terminal device
    An information acquisition unit;
    An information transmitter;
    A trigger information receiver;
    A determination unit;
    A character string information receiver;
    An output unit,
    The information transmitting unit transmits information to the information receiving unit,
    The information receiving unit receives the information,
    The condition generation unit generates a condition based on the current state from the information,
    The trigger information transmitting unit transmits trigger information indicating the condition to the trigger information receiving unit,
    The trigger information receiving unit receives trigger information indicating a condition generated from the information from the server,
    The determination unit determines whether the condition is satisfied,
    The notification signal transmission unit returns a notification signal to the notification signal reception unit when the condition is satisfied,
    The notification signal receiving unit receives a notification signal from the notification signal transmitting unit when the condition is satisfied,
    The character string information generation unit generates character string information based on the notification signal when the notification signal is received,
    The character string information transmitting unit transmits the character string information to the character string information receiving unit,
    The character string information receiving unit receives the character string information,
    The output unit is a system for outputting the character string information.
  26.  情報端末装置とサーバ間で情報を通信するシステムに含まれるサーバであって、
      情報受信部と、
      条件生成部と、
      トリガ情報送信部と、
      通知信号受信部と、
      文字列情報生成部と、
      文字列情報送信部とを備え、
     前記情報受信部は、記情報端末装置から送信された情報を受信し、
     前記条件生成部は、前記情報から現在の状態に基づいて条件を生成し、
     前記トリガ情報送信部は、前記条件を示すトリガ情報を前記情報端末装置に送信し、
     前記通知信号受信部は、前記条件を満たす場合に前記情報端末装置から通知信号を受信し、
     前記文字列情報生成部は、前記通知信号を受信した場合、前記通知信号に基づき文字列情報を生成し、
     前記文字列情報送信部は、前記文字列情報を前記情報端末装置に送信する
     サーバ。
    A server included in a system for communicating information between an information terminal device and a server,
    An information receiver;
    A condition generator;
    A trigger information transmitter;
    A notification signal receiver;
    A string information generator,
    A character string information transmission unit,
    The information receiving unit receives information transmitted from the information terminal device,
    The condition generation unit generates a condition based on the current state from the information,
    The trigger information transmitting unit transmits trigger information indicating the condition to the information terminal device,
    The notification signal receiving unit receives a notification signal from the information terminal device when the condition is satisfied,
    The character string information generation unit, when receiving the notification signal, generates character string information based on the notification signal,
    The character string information transmitting unit is a server that transmits the character string information to the information terminal device.
  27.  情報端末装置とサーバ間で情報を通信するシステムに含まれる前記情報端末装置であって、
      情報送信部と、
      トリガ情報受信部と、
      判定部と、
      文字列情報受信部と、
      出力部とを備え、
     前記情報送信部は、前記システムに含まれるサーバに情報を送信し、
     前記トリガ情報受信部は、前記サーバから、前記情報から生成された条件を示すトリガ情報を受信し、
     前記判定部は、前記条件が満たされたか否かを判定し、
     前記通知信号送信部は、前記条件が満たされた場合に、前記サーバに通知信号を返信し、
     前記音声情報受信部は、前記サーバから、前記通知信号に基づき生成された文字列情報を受信し、
     前記出力部は、前記文字列情報を出力する
     情報端末装置。
    The information terminal device included in a system for communicating information between an information terminal device and a server,
    An information transmitter;
    A trigger information receiver;
    A determination unit;
    A character string information receiver;
    An output unit,
    The information transmission unit transmits information to a server included in the system,
    The trigger information receiving unit receives trigger information indicating a condition generated from the information from the server,
    The determination unit determines whether the condition is satisfied,
    The notification signal transmission unit returns a notification signal to the server when the condition is satisfied,
    The voice information receiving unit receives character string information generated based on the notification signal from the server,
    The output unit outputs the character string information.
  28.  前記情報端末装置から提供された情報は、音声情報である
    請求項25に記載の音声対話システム。
    The voice dialogue system according to claim 25, wherein the information provided from the information terminal device is voice information.
PCT/JP2016/000687 2015-02-12 2016-02-10 Information dissemination method, server, information terminal device, system, and voice interaction system WO2016129276A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2016574669A JP6846617B2 (en) 2015-02-12 2016-02-10 Information provision method, server, information terminal device, system and voice dialogue system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2015025654 2015-02-12
JP2015-025654 2015-02-12

Publications (1)

Publication Number Publication Date
WO2016129276A1 true WO2016129276A1 (en) 2016-08-18

Family

ID=56614517

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2016/000687 WO2016129276A1 (en) 2015-02-12 2016-02-10 Information dissemination method, server, information terminal device, system, and voice interaction system

Country Status (2)

Country Link
JP (1) JP6846617B2 (en)
WO (1) WO2016129276A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019185077A (en) * 2018-03-31 2019-10-24 Nl技研株式会社 Recognition support system, recognition support apparatus and recognition support server
CN110648661A (en) * 2018-06-27 2020-01-03 现代自动车株式会社 Dialogue system, vehicle, and method for controlling vehicle
JP2020072393A (en) * 2018-10-31 2020-05-07 トヨタ自動車株式会社 Sound input/output device for vehicle
JPWO2020188623A1 (en) * 2019-03-15 2020-09-24
JP2021165843A (en) * 2017-03-22 2021-10-14 グーグル エルエルシーGoogle LLC Proactive embedding of unsolicited content into human-to-computer dialogs
US11552814B2 (en) 2017-06-29 2023-01-10 Google Llc Proactive provision of new content to group chat participants
US11929069B2 (en) 2017-05-03 2024-03-12 Google Llc Proactive incorporation of unsolicited content into human-to-computer dialogs

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002221430A (en) * 2001-01-29 2002-08-09 Sony Corp Navigation system, navigation method and program of navigation system
JP2003131691A (en) * 2001-10-23 2003-05-09 Fujitsu Ten Ltd Voice interactive system
JP2004354943A (en) * 2003-05-30 2004-12-16 Nippon Telegr & Teleph Corp <Ntt> Voice interactive system, voice interactive method and voice interactive program
JP2005091611A (en) * 2003-09-16 2005-04-07 Mitsubishi Electric Corp Information terminal, speech recognition server, and speech recognition system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002221430A (en) * 2001-01-29 2002-08-09 Sony Corp Navigation system, navigation method and program of navigation system
JP2003131691A (en) * 2001-10-23 2003-05-09 Fujitsu Ten Ltd Voice interactive system
JP2004354943A (en) * 2003-05-30 2004-12-16 Nippon Telegr & Teleph Corp <Ntt> Voice interactive system, voice interactive method and voice interactive program
JP2005091611A (en) * 2003-09-16 2005-04-07 Mitsubishi Electric Corp Information terminal, speech recognition server, and speech recognition system

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11887594B2 (en) 2017-03-22 2024-01-30 Google Llc Proactive incorporation of unsolicited content into human-to-computer dialogs
JP7247271B2 (en) 2017-03-22 2023-03-28 グーグル エルエルシー Proactively Incorporating Unsolicited Content Within Human-to-Computer Dialogs
JP2021165843A (en) * 2017-03-22 2021-10-14 グーグル エルエルシーGoogle LLC Proactive embedding of unsolicited content into human-to-computer dialogs
US11929069B2 (en) 2017-05-03 2024-03-12 Google Llc Proactive incorporation of unsolicited content into human-to-computer dialogs
US11552814B2 (en) 2017-06-29 2023-01-10 Google Llc Proactive provision of new content to group chat participants
JP2019185077A (en) * 2018-03-31 2019-10-24 Nl技研株式会社 Recognition support system, recognition support apparatus and recognition support server
CN110648661A (en) * 2018-06-27 2020-01-03 现代自动车株式会社 Dialogue system, vehicle, and method for controlling vehicle
CN111114477A (en) * 2018-10-31 2020-05-08 丰田自动车株式会社 Vehicle audio input/output device
JP7211013B2 (en) 2018-10-31 2023-01-24 トヨタ自動車株式会社 Vehicle sound input/output device
CN111114477B (en) * 2018-10-31 2023-02-17 丰田自动车株式会社 Vehicle audio input/output device
JP2020072393A (en) * 2018-10-31 2020-05-07 トヨタ自動車株式会社 Sound input/output device for vehicle
WO2020188623A1 (en) * 2019-03-15 2020-09-24 株式会社島津製作所 Physical condition detection system
JP7359201B2 (en) 2019-03-15 2023-10-11 株式会社島津製作所 Physical condition detection system
JPWO2020188623A1 (en) * 2019-03-15 2020-09-24

Also Published As

Publication number Publication date
JPWO2016129276A1 (en) 2017-11-24
JP6846617B2 (en) 2021-03-24

Similar Documents

Publication Publication Date Title
WO2016129276A1 (en) Information dissemination method, server, information terminal device, system, and voice interaction system
JP6571118B2 (en) Method for speech recognition processing, in-vehicle system, and nonvolatile storage medium
US10875525B2 (en) Ability enhancement
US9046375B1 (en) Navigation for a passenger on a public conveyance based on current location relative to a destination
US9667742B2 (en) System and method of conversational assistance in an interactive information system
US8903651B2 (en) Information terminal, server device, searching system, and searching method thereof
KR20190041569A (en) Dialogue processing apparatus, vehicle having the same and dialogue service processing method
KR20180086718A (en) Dialogue processing apparatus, vehicle having the same and dialogue processing method
EP2925027A1 (en) Selective message presentation by in-vehicle computing system
JP5677647B2 (en) Navigation device
US8718621B2 (en) Notification method and system
WO2015059764A1 (en) Server for navigation, navigation system, and navigation method
JP6339545B2 (en) Information processing apparatus, information processing method, and program
JP2006317573A (en) Information terminal
KR20190109864A (en) Dialogue processing apparatus, vehicle having the same and dialogue processing method
US20200051566A1 (en) Artificial intelligence device for providing notification to user using audio data and method for the same
KR102403355B1 (en) Vehicle, mobile for communicate with the vehicle and method for controlling the vehicle
JP5729257B2 (en) Information providing apparatus and information providing method
JP2017058315A (en) Information processing apparatus, information processing method, and program
US11333518B2 (en) Vehicle virtual assistant systems and methods for storing and utilizing data associated with vehicle stops
CN112193255A (en) Human-computer interaction method, device, equipment and storage medium of vehicle-machine system
WO2010073406A1 (en) Information providing device, communication terminal, information providing system, information providing method, information output method, information providing program, information output program, and recording medium
KR20210081484A (en) Method for providing intelligent contents and autonomous driving system
JP2020091416A (en) Guidance voice output control system and guidance voice output control method
JP2019200546A (en) Agent server

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16748921

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2016574669

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16748921

Country of ref document: EP

Kind code of ref document: A1