WO2019230363A1 - Sound emission system, information processing system, information providing method, and information processing method - Google Patents

Sound emission system, information processing system, information providing method, and information processing method Download PDF

Info

Publication number
WO2019230363A1
WO2019230363A1 PCT/JP2019/019022 JP2019019022W WO2019230363A1 WO 2019230363 A1 WO2019230363 A1 WO 2019230363A1 JP 2019019022 W JP2019019022 W JP 2019019022W WO 2019230363 A1 WO2019230363 A1 WO 2019230363A1
Authority
WO
WIPO (PCT)
Prior art keywords
response
information
sound
identification information
related information
Prior art date
Application number
PCT/JP2019/019022
Other languages
French (fr)
Japanese (ja)
Inventor
石田 哲朗
優樹 瀬戸
Original Assignee
ヤマハ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ヤマハ株式会社 filed Critical ヤマハ株式会社
Publication of WO2019230363A1 publication Critical patent/WO2019230363A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/04Details of speech synthesis systems, e.g. synthesiser structure or memory management
    • G10L13/047Architecture of speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/018Audio watermarking, i.e. embedding inaudible data in the audio signal
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B11/00Transmission systems employing sonic, ultrasonic or infrasonic waves

Definitions

  • This disclosure relates to a technique for providing information to a terminal device.
  • Patent Document 1 discloses a service robot that assists the operation of a vending machine by interacting with a user who uses the vending machine.
  • the service robot only utters a voice for dialogue to the user.
  • a user who desires more detailed information related to the content of the voice uttered by the service robot needs to acquire information related to the voice he / she listened to by using a search site by operating a terminal device, for example.
  • the present disclosure is intended to acquire information related to voice without requiring a complicated operation by the user.
  • an information providing method receives an input from a user, and outputs a sound component representing identification information of related information related to a response to the received input. Let the sound be emitted.
  • An information processing method generates a response to an input by a user, generates related information about the generated response, and stores acoustic data representing an acoustic component representing identification information corresponding to the related information.
  • Related information corresponding to the identification information in response to an information request from the terminal device that transmits to the sound emission system that emits sound according to the acoustic data and receives the identification information by acoustic communication by the sound emission system Is transmitted to the terminal device.
  • a sound emission system includes a reception unit that receives an input from a user, a sound emission device that emits sound, and an acoustic that represents identification information of related information related to a response to the input received by the reception unit.
  • a sound emission control unit that causes the sound emission device to emit the component.
  • An information processing system includes a response generation unit that generates a response to an input by a user, a related information generation unit that generates related information about a response generated by the response generation unit, and the related information generation unit
  • a first communication control unit that transmits acoustic data representing an acoustic component representing identification information of related information generated by the sound emitting system according to the acoustic data, and acoustic communication by the sound emitting system.
  • a second communication control unit that transmits related information corresponding to the identification information to the terminal device in response to an information request from the terminal device that has received the identification information.
  • FIG. 1 is a block diagram illustrating a configuration of an information providing system 100 according to the first embodiment of the present disclosure.
  • the information providing system 100 includes a sound emitting system 20, a response server 30, and an information providing server 40.
  • the information providing system 100 is a computer system for providing various types of information to the user U of the terminal device 50. Specifically, a response to the voice (hereinafter referred to as “uttered voice”) V1 uttered by the user U of the terminal device 50 and information R (hereinafter referred to as “related information”) R related to the response are given to the user U.
  • uttered voice a response to the voice
  • V1 uttered by the user U of the terminal device 50
  • information R hereinafter referred to as “related information”
  • the response server 30 communicates with the sound emitting system 20 and the information providing server 40 via a communication network including the Internet, for example.
  • the response server 30 generates a response to the utterance voice V1 of the user U and related information R related to the response.
  • a sound V2 representing the response generated by the response server 30 (hereinafter referred to as “response sound”) is reproduced by the sound emission system 20.
  • the related information R generated by the response server 30 is transmitted to the terminal device 50 by the information providing server 40.
  • details of the information providing system 100 will be described.
  • FIG. 2 is a block diagram illustrating the configuration of the sound emission system 20.
  • the sound emitting system 20 is a computer system that reproduces the response voice V2 to the uttered voice V1 by the user U of the terminal device 50.
  • a voice interaction device (so-called AI speaker) that interacts with the user U is used as the sound emission system 20.
  • a portable information processing device such as a mobile phone or a smartphone, or an information processing device such as a personal computer is used as the sound emitting system 20.
  • the sound emission system 20 is installed in a facility, athletic facility such as a stadium or gymnasium.
  • the utterance voice V1 is an utterance voice including a question (question) and a talk, for example.
  • the response voice V2 is a response voice including an answer to the question and an answer to the talk.
  • a voice that utters one or more keywords (for example, words) related to a matter that the user U desires to search may be referred to as a speech voice V1
  • a voice that represents a matter related to the keyword as a search result may be referred to as a response voice V2.
  • a response voice V1 For example, when the user U utters an utterance voice V1 that asks the location of a restaurant in a commercial facility, “Do you have a restaurant nearby?”, The answer to the utterance voice V1 is “A restaurant ABC is nearby. .
  • the sound emission system 20 of the first embodiment includes a sound collection device 21 (an example of a reception unit), a sound emission device 22, a storage device 23, a control device 24, and a communication device 25. .
  • the sound collection device 21 is an input device that collects ambient sounds.
  • the sound collection device 21 of the first embodiment generates data (hereinafter referred to as “input data”) D1 representing the uttered voice V1 uttered by the user U. That is, the sound collection device 21 functions as a reception unit that receives the uttered voice V1 (an example of an input by the user U) pronounced by the user U. Specifically, the sound collection device 21 collects the utterance voice V1 sounded by the user U and generates a signal representing the waveform of the utterance voice V1, and converts the signal from analog to digital. And an A / D converter for generating input data D1.
  • the control device 24 (an example of a computer) is constituted by a processing circuit such as a CPU (Central Processing Unit), and controls each element of the sound emission system 20 in an integrated manner.
  • the storage device 23 stores a program executed by the control device 24 and various data used by the control device 24.
  • a known recording medium such as a semiconductor recording medium and a magnetic recording medium, or a combination of a plurality of types of recording media is arbitrarily adopted as the storage device 23.
  • the control device 24 realizes a plurality of functions (communication control unit 243 and sound emission control unit 245) by executing a program stored in the storage device 23 as illustrated in FIG. Note that some functions of the control device 24 may be realized by a dedicated electronic circuit. Further, the function of the control device 24 may be installed in a plurality of devices.
  • the communication control unit 243 receives and transmits various types of information through the communication device 25. That is, the communication control unit 243 controls the communication device 25 so as to receive and transmit various types of information.
  • the communication control unit 243 transmits the input data D1 generated by the sound collection device 21 to the response server 30 via the communication device 25.
  • the response server 30 Upon receiving the input data D1, the response server 30 generates data (hereinafter referred to as “acoustic data”) D2 for causing the sound emission system 20 to emit the response voice V2 corresponding to the utterance voice V1 represented by the input data D1.
  • the communication control unit 243 receives the acoustic data D2 generated by the response server 30 from the response server 30 by the communication device 25.
  • the sound emission control unit 245 causes the sound emission device 22 to emit sound according to the acoustic data D2 transmitted from the response server 30.
  • the communication device 25 is a communication device that communicates with the response server 30 via the communication network under the control of the communication control unit 243.
  • the communication device 25 includes a transmission unit 251 and a reception unit 253.
  • the transmission unit 251 transmits input data D1 representing the uttered voice V1 collected by the sound collection device 21 to the response server 30.
  • the receiving unit 253 receives the acoustic data D2 generated by the response server 30.
  • the sound emitting device 22 is an output device that emits various sounds. Specifically, the sound emitting device 22 emits sound according to the acoustic data D2 received by the communication device 25 under the control of the sound emission control unit 245.
  • the response voice V2 represented by the acoustic data D2 is emitted by the sound emission device 22. Therefore, the user U who pronounced the uttered voice V1 can listen to the response voice V2 corresponding to the uttered voice V1.
  • FIG. 3 is a block diagram illustrating the configuration of the response server 30.
  • the response server 30 is a computer system that generates a response to the utterance voice V1 of the user U and related information R related to the response.
  • the response server 30 includes a storage device 31, a control device 32, and a communication device 33.
  • the storage device 31 stores a program executed by the control device 32 and various data used by the control device 32.
  • a known recording medium such as a semiconductor recording medium and a magnetic recording medium, or a combination of a plurality of types of recording media is arbitrarily adopted as the storage device 31.
  • the storage device 31 stores a related information table.
  • the related information table is a data table used for specifying the related information R of the response to the speech voice V1. Details of the related information table will be described later.
  • the control device 32 (an example of a computer) is constituted by a processing circuit such as a CPU (Central Processing Unit), and controls each element of the sound emission system 20 in an integrated manner. As illustrated in FIG. 2, the control device 32 executes a program stored in the storage device 31 to execute a plurality of functions (voice recognition unit 321, response generation unit 322, related information generation unit 323, identification information generation unit). 324, the signal generation unit 325, and the communication control unit 326). Note that some functions of the control device 32 may be realized by a dedicated electronic circuit. Further, the function of the control device 32 may be installed in a plurality of devices.
  • a processing circuit such as a CPU (Central Processing Unit)
  • the control device 32 executes a program stored in the storage device 31 to execute a plurality of functions (voice recognition unit 321, response generation unit 322, related information generation unit 323, identification information generation unit). 324, the signal generation unit 325, and the communication control unit 326).
  • the voice recognition unit 321 specifies a character string representing the utterance content of the uttered voice V1 (hereinafter referred to as “uttered character string”) by voice recognition on the input data D1 transmitted from the sound emitting system 20. For example, when the user U pronounces the utterance voice V1 that asks the location of the restaurant, the utterance character string “Is the restaurant nearby?” Is specified.
  • uttered character string For voice recognition with respect to the input data D1, for example, a known technique such as a recognition process using an acoustic model such as HMM (Hidden Markov Model) and a language model indicating linguistic restrictions is arbitrarily adopted.
  • the response generation unit 322 generates a response to the uttered voice V1. Specifically, the response generation unit 322 generates a character string (hereinafter referred to as “response character string”) representing a response to the utterance character string specified by the voice recognition unit 321. For example, when an utterance character string “is a restaurant nearby?” Is specified, a response character string “the restaurant ABC is nearby” indicating the location of the restaurant ABC is specified. For the generation of the response character string, known techniques such as natural language processing such as morphological analysis for the utterance character string and a dialogue technique using artificial intelligence are arbitrarily employed.
  • the related information generation unit 323 generates related information R related to the response generated by the response generation unit 322.
  • the related information R is, for example, content for supplementing response details.
  • the content for supplementing the content of a specific word (hereinafter referred to as “response word”) included in the response character string is exemplified as the related information R.
  • the response word is a characteristic word such as a proper noun among words included in the response character string.
  • the response word contained in the response character string “Restaurant ABC is nearby” is “Restaurant ABC”.
  • Various contents such as information (for example, URL of a homepage) describing the matter represented by the response word, information indicating the location of the matter represented by the response word (for example, a map image, a URL of the map, or a character string indicating the location) are related information R
  • information for example, URL of a homepage
  • information indicating the location of the matter represented by the response word for example, a map image, a URL of the map, or a character string indicating the location
  • the related information R is not limited to the above example, and is arbitrarily changed according to the content and type of the response word.
  • known natural language processing such as morphological analysis is arbitrarily employed.
  • the related information table is used to generate the related information R.
  • FIG. 4 is a schematic diagram of a related information table.
  • the related information table is a table in which a plurality of related information R is registered. Specifically, for each of a plurality of response words, related information R corresponding to the response word is registered.
  • the related information generation unit 323 extracts a response word from the response character string generated by the response generation unit 322, and specifies the related information R corresponding to the response word among the plurality of related information R registered in the related information table. . As understood from the above description, the related information R corresponding to the response word of the response character string generated by the response generation unit 322 is generated. A plurality of related information R may be generated for the response word.
  • identification information I for identifying the related information R generated by the related information generating unit 323.
  • the identification information generating unit 324 of FIG. Different identification information I is generated for each of the plurality of related information R registered in the related information table.
  • the identification information I generated in advance for each related information R may be registered in advance in the related information table in association with the related information R.
  • the signal generation unit 325 generates acoustic data D2 representing the response voice V2 representing the response generated by the response generation unit 322 and the acoustic component of the identification information I corresponding to the related information R generated by the related information generation unit 323. . Specifically, acoustic data D2 representing a mixed sound of the response voice V2 and the acoustic component of the identification information I is generated.
  • FIG. 5 is a block diagram of the signal generation unit 325. As illustrated in FIG. 5, the signal generation unit 325 includes a speech synthesis unit 71, a modulation processing unit 73, and an addition unit 74.
  • the voice synthesis unit 71 generates a voice signal by voice synthesis with respect to the response character string generated by the response generation unit 322. A known speech synthesis technique is arbitrarily employed for generating the speech signal.
  • the modulation processing unit 73 generates a modulation signal representing the acoustic component of the identification information I generated by the identification information generation unit 324.
  • the modulation signal is generated, for example, by frequency-modulating a carrier wave having a predetermined frequency with the identification information I.
  • the modulation signal may be generated by sequentially executing spread modulation of each information using a spread code and frequency conversion using a carrier wave of a predetermined frequency.
  • the frequency band of the modulated signal is a frequency band that can be emitted by the sound emitting device 22 and collected by the terminal device 50, and the frequency band of the sound that the user U of the terminal device 50 listens to in a normal environment.
  • the frequency band of the modulation signal is arbitrary, and for example, a modulation signal within an audible band can be generated.
  • the adding unit 74 adds the audio signal generated by the audio synthesizing unit 71 and the modulation signal generated by the modulation processing unit 73 to generate acoustic data D2.
  • the communication control unit 326 (illustrated as the first communication control unit) in FIG. That is, the communication control unit 326 controls the communication device 33 so as to receive and transmit various types of information.
  • the communication control unit 326 receives the input data D 1 transmitted from the sound emission system 20 by the communication device 33.
  • the communication control unit 326 transmits the acoustic data D ⁇ b> 2 generated by the signal generation unit 325 to the sound emission system 20 through the communication device 33.
  • the communication control unit 326 includes data (hereinafter referred to as “provided data”) including the related information R generated by the related information generating unit 323 and the identification information I generated by the identification information generating unit 324 regarding the related information R. )
  • D3 is transmitted to the information providing server 40 by the communication device 33.
  • the communication device 33 communicates with each of the sound emitting system 20 and the information providing server 40 via the communication network under the control of the communication control unit 326.
  • the communication device 33 includes a transmission unit 331 and a reception unit 333.
  • the receiving unit 333 receives the input data D1 transmitted from the sound emission system 20.
  • the transmission unit 331 transmits the acoustic data D2 generated by the signal generation unit 325 to the sound emitting system 20, and transmits the provision data D3 to the information providing server 40.
  • the sound emission control unit 245 of the sound emission system 20 that has received the sound data D2 causes the sound emission device 22 to emit sound according to the sound data D2. Specifically, by supplying the acoustic data D2 to the sound emitting device 22, the mixed sound represented by the acoustic data D2 is emitted from the sound emitting device 22. That is, the response sound V2 to the utterance voice V1 of the user U and the acoustic component of the identification information I of the related information R related to the response represented by the response voice V2 are emitted from the sound emission device 22.
  • the sound emitting device 22 functions as an acoustic device that reproduces the response voice V2, and transmits the identification information I to the surroundings by acoustic communication using sound waves as air vibration as a transmission medium. It also functions as a machine. That is, the identification information I is transmitted to the surroundings by acoustic communication that emits the sound of the identification information I from the sound emitting device 22 that emits the response voice V2. The identification information I is transmitted every time the response voice V2 is emitted. For example, the identification information I is transmitted together with the sound of the response voice V2 (for example, in parallel with or before the sound of the response voice V2 is emitted).
  • FIG. 6 is a block diagram of the information providing server 40.
  • the information providing server 40 is a computer system for transmitting related information R related to the response of the user U to the uttered voice V1 to the terminal device 50.
  • the information providing server 40 includes a storage device 41, a control device 42, and a communication device 43.
  • the storage device 41 stores a program executed by the control device 42 and various data used by the control device 42.
  • a known recording medium such as a semiconductor recording medium and a magnetic recording medium, or a combination of a plurality of types of recording media is arbitrarily employed as the storage device 41.
  • the storage device 41 stores an information provision table.
  • the information provision table is a data table used for providing the terminal device 50 with the related information R of the response to the speech voice V1.
  • the identification information I and the related information R included in the provision data D3 transmitted from the response server 30 are registered in the information provision table in a state where they correspond to each other.
  • Provided data D3 is generated for each utterance voice V1 from the user U.
  • the control device 42 (an example of a computer) is configured by a processing circuit such as a CPU (Central Processing Unit), for example, and comprehensively controls each element of the sound emission system 20. Specifically, as illustrated in FIG. 2, the control device 42 executes a program stored in the storage device 41 to thereby execute a plurality of functions (a storage control unit 421, a related information specifying unit 423, and a communication control unit 425. ). Note that some functions of the control device 42 may be realized by a dedicated electronic circuit. Further, the function of the control device 42 may be installed in a plurality of devices.
  • a processing circuit such as a CPU (Central Processing Unit)
  • the storage control unit 421 stores the provided data D3 received by the communication device 43 in the storage device 41. Specifically, the storage control unit 421 registers the identification information I and the related information R included in the provision data D3 in association with each other in the information provision table.
  • the related information specifying unit 423 specifies related information R corresponding to the identification information I in response to an information request from the terminal device 50 that has received the identification information I by acoustic communication by the sound emission system 20.
  • the information request from the terminal device 50 includes identification information I.
  • the related information specifying unit 423 displays the related information R corresponding to the identification information I included in the information request from the terminal device 50 among the plurality of related information R registered in the information providing table. Identify from.
  • the communication control unit 425 receives and transmits various types of information through the communication device 43. That is, the communication control unit 425 controls the communication device 43 so as to receive and transmit various types of information.
  • the communication control unit 425 receives the provision data D ⁇ b> 3 transmitted from the response server 30 by the communication device 43.
  • the communication control unit 425 responds to an information request from the terminal device 50 that has received the identification information I by acoustic communication by the sound emission system 20, and the related information R corresponding to the identification information I (that is, related information identification).
  • the related information R) specified by the unit 423 is transmitted to the terminal device 50 by the communication device 43.
  • the communication device 43 communicates with each of the response server 30 and the terminal device 50 via the communication network under the control of the communication control unit 425.
  • the communication device 43 includes a transmission unit 431 and a reception unit 433.
  • the receiving unit 433 receives the provision data D3 transmitted from the response server 30.
  • the transmission unit 431 transmits the related information R to the terminal device 50.
  • the response server 30 and the information providing server 40 function as an information processing system that generates a response to the utterance voice V1 of the user U and related information R related to the response.
  • FIG. 7 is a block diagram of the terminal device 50.
  • the terminal device 50 is located near the sound emission system 20.
  • the terminal device 50 is a portable information terminal for acquiring, from the information providing server 40, related information R related to a response to the uttered voice V1 uttered by the user U.
  • a mobile phone, a smartphone, a tablet terminal, a personal computer, or the like is used as the terminal device 50.
  • the terminal device 50 includes a sound collection device 51, a control device 52, a storage device 53, a communication device 54, and a playback device 55.
  • the sound collection device 51 is an acoustic device (microphone) that collects surrounding sounds. Specifically, the sound collection device 51 collects the sound emitted by the sound emission system 20 according to the acoustic data D2, and generates an acoustic signal Y representing the waveform of the sound. Therefore, the acoustic signal Y generated by sound collection in the vicinity of the sound emission system 20 can include the acoustic component of the identification information I.
  • the sound collection device 51 is used for voice communication between the terminal devices 50 or voice recording at the time of moving image shooting, and by acoustic communication using sound waves as air vibration as a transmission medium. It also functions as a receiver that receives the identification information I. Note that an A / D converter that converts the acoustic signal Y generated by the sound pickup device 51 from analog to digital is not shown for convenience. Further, instead of the sound collecting device 51 configured integrally with the terminal device 50, a separate sound collecting device 51 may be connected to the terminal device 50 by wire or wirelessly.
  • the control device 52 (an example of a computer) is configured by a processing circuit such as a CPU (Central Processing Unit) and controls each element of the terminal device 50 in an integrated manner.
  • the storage device 53 stores a program executed by the control device 52 and various data used by the control device 52.
  • a known recording medium such as a semiconductor recording medium and a magnetic recording medium, or a combination of a plurality of types of recording media can be arbitrarily employed as the storage device 53.
  • control device 52 realizes a plurality of functions (the information extraction unit 521 and the reproduction control unit 523) by executing a program stored in the storage device 53, as illustrated in FIG. Note that some functions of the control device 52 may be realized by a dedicated electronic circuit. Further, the function of the control device 52 may be mounted on a plurality of devices.
  • the information extraction unit 521 extracts the identification information I from the acoustic signal Y generated by the sound collection device 51. Specifically, the information extraction unit 521 performs identification by, for example, filtering processing that emphasizes the frequency band including the acoustic component of the identification information I in the acoustic signal Y and demodulation processing corresponding to the modulation processing on the identification information I. Information I is extracted. The identification information I extracted by the information extraction unit 521 is used to acquire related information R corresponding to the identification information I (that is, related information R related to the response represented by the response voice V2 emitted by the sound emitting device 22). .
  • the identification information I can be received only at a position within the range where the response voice V2 corresponding to the identification information I can be collected, the identification information I is also expressed as information indicating the position of the terminal device 50. it can. Therefore, the related information R can be provided only for the terminal devices 50 located around the sound emission system 20.
  • the communication device 54 communicates with the information providing server 40 via the communication network under the control of the control device 52. Specifically, the communication device 54 transmits the identification information I extracted by the information extraction unit 521 to the information providing server 40. The information providing server 40 acquires related information R corresponding to the identification information I transmitted from the terminal device 50 and transmits it to the terminal device 50. The communication device 54 receives the related information R transmitted from the information providing server 40.
  • the reproduction control unit 523 causes the reproduction device 55 to reproduce the related information R received by the communication device 54.
  • the playback device 55 is an output device that plays back the related information R.
  • the playback device 55 includes a display device that displays an image represented by the related information R.
  • a separate playback device 55 may be connected to the terminal device 50 by wire or wirelessly.
  • the playback device 55 may include a sound emitting device that emits sound represented by the related information R. That is, the reproduction by the reproduction device 55 includes image display and sound emission.
  • FIG. 8 is a flowchart of processing of the information providing system 100 as a whole.
  • the process of FIG. 9 is started when the user U pronounces the uttered voice V1.
  • the sound collection device 21 of the sound emission system 20 receives the uttered voice V1 from the user U (Sa1). Specifically, input data D1 representing the uttered voice V1 uttered by the user U is generated by the sound collecting device 21.
  • the communication control unit 243 of the sound emission system 20 transmits the input data D1 generated by the sound collection device 21 to the response server 30 through the communication device 25 (Sa2).
  • the communication control unit 326 of the response server 30 receives the input data D1 transmitted from the sound emitting system 20 by the communication device 33 (Sa3).
  • the voice recognition unit 321 specifies an utterance character string by voice recognition on the input data D1 received by the communication device 33 (Sa4).
  • the response generation unit 322 generates a response to the uttered voice V1 (Sa5). Specifically, a response character string corresponding to the utterance character string specified by the voice recognition unit 321 is generated.
  • the related information generating unit 323 generates related information R related to the response generated by the response generating unit 322 (Sa6).
  • the identification information generation unit 324 generates identification information I for identifying the related information R generated by the related information generation unit 323 (Sa7).
  • the signal generator 325 generates acoustic data D2 (Sa8). Specifically, acoustic data D2 representing a mixed sound of the response voice V2 and the acoustic component of the identification information I is generated.
  • the communication control unit 326 transmits the provision data D3 to the information provision server 40 through the communication device 33 (Sa9).
  • the provided data D3 includes related information R generated by the related information generating unit 323 and identification information I generated by the identification information generating unit 324 regarding the related information R.
  • the communication control unit 425 of the information providing server 40 receives the provision data D3 transmitted from the response server 30 by the communication device 43 (Sa10).
  • the storage control unit 421 stores the provided data D3 received by the communication device 43 in the storage device 41 (Sa11). Specifically, the storage control unit 421 stores the related information R and the identification information I included in the provided data D3 in the storage device 41 in association with each other.
  • the communication control unit 326 of the response server 30 transmits the acoustic data D2 generated by the signal generation unit 325 to the sound emission system 20 through the communication device 33 (Sa12).
  • the communication control unit 243 of the sound emitting system 20 receives the acoustic data D2 transmitted from the response server 30 by the communication device 25 (Sa13).
  • the sound emission control unit 245 causes the sound emission device 22 to emit sound according to the acoustic data D2 (Sa14).
  • the sound emitting device 22 transmits the identification information I to the terminal device 50 by emitting a mixed sound of the response voice V2 and the acoustic component of the identification information I (Sa15). That is, the identification information I is transmitted to the terminal device 50 by acoustic communication using the sound emitting device 22.
  • the sound collection device 51 of the terminal device 50 collects the sound emitted by the sound emission system 20 according to the acoustic data D2 (that is, the sound including the acoustic component of the identification information I) (Sa16). Specifically, an acoustic signal that represents the waveform of the collected sound is generated.
  • the information extraction unit 521 extracts the identification information I from the acoustic signal generated by the sound collection device 51 (Sa17).
  • the communication device 54 transmits the identification information I extracted by the information extraction unit 521 to the information providing server 40 (Sa18).
  • the communication control unit 425 of the information providing server 40 receives the identification information I transmitted from the terminal device 50 by the communication device 43 (Sa19).
  • the related information specifying unit 423 specifies related information R corresponding to the identification information I received by the communication device 43 (Sa20).
  • the communication control unit 425 transmits the related information R specified by the related information specifying unit 423 to the terminal device 50 through the communication device 43 (Sa21).
  • the communication device 54 of the terminal device 50 receives the related information R transmitted from the information providing server 40 (Sa22).
  • the reproduction control unit 523 causes the reproduction device 55 to reproduce the related information R received by the communication device 54 (Sa23). That is, the related information R related to the response represented by the response voice V2 emitted by the sound emitting device 22 is reproduced by the reproducing device 55.
  • the identification information I since the identification information I is transmitted to the terminal device 50 by acoustic communication using the sound emitting device 22 that emits the response sound V2, the response sound V2 represents.
  • the related information R related to the response (for example, more detailed information related to the response) can be acquired by the terminal device 50 using the identification information I. Therefore, it is possible to reduce a load of the user U giving a complicated operation to the terminal device 50 in order to acquire the related information R related to the response voice V2. Further, the identification information I can be transmitted to the terminal device 50 by diverting the sound emitting device 22 for emitting the response voice V2. That is, a transmitter dedicated to the transmission of the identification information I is not necessary.
  • the uttered voice V1 received by the sound emission system 20 is transmitted to the response server 30, and the acoustic data D2 of the response voice V2 representing the response generated by the response server 30 is received by the receiving unit 253. It is not necessary to incorporate an element for generating the response voice V2 in the sound emission system 20. Therefore, the configuration and operation of the sound emission system 20 are simplified.
  • the related information R corresponding to the response word included in the response character string generated by the response generation unit 322 is generated, the related information R corresponding to the entire response character string is specified.
  • the related information R can be easily identified as compared with. Therefore, the processing load on the information providing server 40 can be reduced.
  • Second Embodiment A second embodiment of the present disclosure will be described.
  • elements having the same functions as those of the first embodiment are diverted using the same reference numerals used in the description of the first embodiment, and detailed descriptions thereof are appropriately omitted.
  • the identification information I of the related information R is generated by the response server 30.
  • the identification information I of the related information R is generated by the sound emission system 20. That is, the identification information generation unit 324 is omitted in the response server 30 of the second embodiment.
  • the control device 24 of the sound emission system 20 functions as an identification information generation unit in addition to the communication control unit 243 and the sound emission control unit 245.
  • the identification information generation unit When the sound collection device 21 receives the utterance voice V1 produced by the user U (that is, when the input data D1 is generated), the identification information generation unit generates the identification information I corresponding to the input data D1. Identification information I corresponding to the related information R generated by the response server 30 in accordance with the input data D1 is generated in advance by the identification information generation unit.
  • the communication control unit 243 transmits the input data D1 generated by the sound emitting device 22 and the identification information I generated by the identification information generation unit to the response server 30 through the communication device 25.
  • the communication control unit 326 of the response server 30 receives the input data D1 and the identification information I transmitted from the sound emitting system 20 by the communication device 33.
  • the voice recognition unit 321 of the response server 30 that has received the input data D1 identifies an utterance character string from the input data D1 as in the first embodiment.
  • the response generation unit 322 generates a response character string for the utterance character string, as in the first embodiment.
  • the related information generation unit 323 generates related information R related to the response represented by the response character string, as in the first embodiment.
  • the signal generator 325 generates acoustic data D2 representing the response voice V2 and the acoustic component of the identification information I transmitted from the sound emission system 20.
  • the acoustic data D2 generated by the signal generation unit 325 is transmitted to the sound emission system 20 under the control of the communication control unit 326, as in the first embodiment.
  • the provided data D3 including the related information R generated by the related information generating unit 323 and the identification information I transmitted from the sound emission system 20 is transmitted to the information providing server 40 under the control of the communication control unit 326. Is done.
  • the information providing server 40 that has received the provided data D3 stores the provided data D3 in the storage device 41 as in the first embodiment. That is, the identification information I generated by the sound emission system 20 is registered in the storage device 41 in a state corresponding to the related information R generated by the response server 30.
  • the sound emission system 20 that has received the acoustic data D2 sends the response voice V2 and the acoustic component representing the identification information I of the related information R corresponding to the response voice V2 in accordance with the acoustic data D2, as in the first embodiment. Sounds out.
  • the terminal device 50 acquires the related information R from the information providing server 40 as in the first embodiment.
  • the same effect as in the first embodiment is realized.
  • the response server 30 since the identification information I is transmitted from the sound emitting system 20 to the response server 30, the response server 30 determines the correspondence between the response voice V 2 and the identification information I without generating the identification information I in the response server 30. 30 can be managed. Therefore, the processing load on the response server 30 can be reduced.
  • the speech voice V1 is exemplified as the input by the user U.
  • the input by the user U is not limited to the speech voice V1.
  • a character string designated by the user U may be input by the user U.
  • the operating device includes, for example, a plurality of operators operated by the user U (for example, a plurality of operators corresponding to kana characters, alphabets, or numbers in Japanese).
  • the user U instructs, for example, a character string including an inquiry (question) and a conversation (hereinafter referred to as “input character string”) to the operating device.
  • the input character string may be one or more keywords (for example, words) related to a matter that the user U desires to search.
  • the controller device accepts an input character string. Specifically, input data D1 representing the input character string is generated. That is, the controller device functions as a reception unit that receives an input character string that the user U has instructed the controller device.
  • the response server 30 Upon receiving the input data D1, the response server 30 generates a response character string and related information R according to the input data D1. That is, the voice recognition unit 321 is omitted.
  • the user U may select a desired option from among a plurality of options each representing a question and a conversation prepared in advance using the operation device.
  • Input data D1 indicating the question or conversation set in the option selected by the user U is generated. That is, the controller device functions as a reception unit that receives selection of options by the user U.
  • the choice selection corresponds to the input of the user U.
  • the input from the user U is information given to the accepting unit according to the intention of the user U, and examples thereof include an utterance voice V1, an input character string, and options.
  • a device used as a reception unit that receives input from the user U is also changed as appropriate.
  • the related information R corresponding to the response word of the response character string is generated.
  • the related information R is information related to the response to the input from the user U
  • the content is arbitrary. It is.
  • the related information R may be generated in consideration of the entire content of the response character string.
  • the related information generation unit 323 generates the related information R indicating the location of the restaurant ABC, for example, for an utterance character string “Where is the restaurant ABC?”.
  • the response character string itself or a character string obtained by translating the response character string into another language may be used as the related information R. Note that it is not essential to use the related information table for generating the related information R.
  • the generation of the related information R includes both specifying any one of the plurality of related information R registered in the related information table and newly generating the related information R in response to an input from the user U.
  • the method for generating the related information R is appropriately changed according to the content and type of the related information R.
  • the response character string is generated by the response generation unit 322 as a response to the speech voice V1, but the response generated by the response generation unit 322 is not limited to the response character string.
  • the storage device 23 can store the response voice V2 in advance.
  • the response generation unit 322 specifies the response voice V2 corresponding to the input data D1 from the storage device 23 as a response to the utterance voice V1.
  • the response generation unit 322 may generate a character string obtained by translating the utterance character string generated by the voice recognition unit 321 into another language as a response to the utterance voice V1.
  • a response voice V2 obtained by translating the speech voice V1 into another language is emitted from the sound emission system 20.
  • an automatic translator that translates the speech U1 of the user U into another language is used as the sound emission system 20.
  • the automatic translator is the sound emission system 20
  • a character string obtained by translating an utterance character string into another language is used as the related information R.
  • the function of the response server 30 may be installed in an automatic translator.
  • the sound emission system 20 presents a response to the utterance voice V1 to the user U by emitting the response voice V2, but together with the sound emission of the response voice V2, for example, the sound emission system 20
  • the response character string and the related information R may be displayed on the display device (for example, a liquid crystal display). Further, a configuration that omits sound emission of the response voice V2 is also employed.
  • the acoustic data D2 representing the response voice V2 and the acoustic component of the identification information I
  • the acoustic data D2 representing only the acoustic component of the identification information I is transmitted to the sound emitting system 20, and the sound emitting system 20, identification information I is transmitted to the terminal device 50.
  • the display of the response character string may be omitted.
  • the acoustic data D2 representing the mixed sound of the response voice V2 and the acoustic component of the identification information I is generated by the response server 30, but the response server 30 uses the response voice V2 and the identification information I.
  • the sound data D2 including the sound components as individual sounds may be generated, and the sound data D2 may be transmitted to the sound emitting system 20.
  • the sound emitting system 20 emits sound according to the sound data D2.
  • a mixed sound of the response voice V2 and the acoustic component of the identification information I may be emitted, or the response voice V2 and the acoustic component of the identification information I may be emitted individually.
  • the relationship between the response voice V2 and the sound component of the identification information I is arbitrary.
  • the response voice V2 and the acoustic component of the identification information I may be emitted in parallel, or the response voice V2 and the acoustic component of the identification information I may be emitted in different periods on the time axis.
  • the sound emission control unit 245 is comprehensive as an element that causes the sound emission device 22 to emit a response voice V2 representing a response to the input received by the reception unit and an acoustic component representing the identification information I of the related information R related to the response. It is expressed in
  • the response server 30 produced
  • the sound emission system 20 may produce
  • the response server 30 generates a response character string and identification information I in the sound emission system 20.
  • the sound emitting system 20 generates acoustic data D2 from the response character string transmitted from the response server 30 and the identification information I, and emits sound according to the acoustic data D2. That is, the signal generation unit 325 can be omitted from the response server 30.
  • the identification information generation unit 324 generates the identification information I every time the related information R is generated. However, the identification information I is registered in advance for the related information R registered in the related information table. You may keep it.
  • the identification information generating unit 324 specifies the identification information I corresponding to the related information R from the related information table.
  • each of the plurality of related information R may be registered in advance in the information provision table in association with the identification information I of the related information R. In the above configuration, transmission of the provision data D3 from the response server 30 to the information provision server 40 is omitted.
  • the sound emission system 20 transmits the acoustic signal representing the utterance voice V1 as the input data D1 to the response server 30, but the utterance character string of the utterance voice V1 is input to the response server 30 as the input data D1. You may send it. That is, the voice recognition unit 321 can be omitted from the response server 30.
  • the information providing system 100 is configured by the response server 30, the information providing server 40, and the sound emitting system 20, but the configuration of the information providing system 100 is not limited to the above examples.
  • the information providing system 100 may be configured with a single device.
  • the response server 30 and the sound emission system 20 may be realized by a single device, or the response server 30 and the information providing system 100 may be realized by a single device.
  • the voice interactive device is used as the sound emission system 20.
  • information related to a purchased item by the user U can be used as the related information R.
  • the identification information I is transmitted to the terminal device 50 by acoustic communication using the sound emitting device 22.
  • the sound emitting system 20 may be Bluetooth (registered trademark) or Wi-Fi (registered trademark).
  • the identification information I may be transmitted to the terminal device 50 by short-range wireless communication. That is, the identification information I is transmitted to the terminal device 50 by a communication device different from the sound emitting device 22 that emits the response voice V2.
  • the functions of the sound emission system 20, the information processing system (the response server 30 and the information providing server 40), and the terminal device 50 according to each of the above-described forms are the same as the cooperation between the control device and the program as illustrated in each form. Realized by work.
  • the programs according to the above-described embodiments can be provided in a form stored in a computer-readable recording medium and installed in the computer.
  • the recording medium is, for example, a non-transitory recording medium, and an optical recording medium (optical disk) such as a CD-ROM is a good example, but a known arbitrary one such as a semiconductor recording medium or a magnetic recording medium This type of recording medium is also included.
  • the non-transitory recording medium includes an arbitrary recording medium excluding a transient propagation signal (transitory, “propagating” signal), and does not exclude a volatile recording medium.
  • the program may be provided to the computer in the form of distribution via a communication network.
  • the information providing method receives an input from a user, and causes the sound emitting device to emit an acoustic component representing identification information of related information related to a response to the received input.
  • the terminal device since the identification information is transmitted to the terminal device by acoustic communication using the sound emitting device, the terminal device displays the related information (for example, more detailed information about the response) related to the response represented by the response voice. Can be obtained using. Therefore, it is possible to reduce a load of the user giving a complicated operation to the terminal device in order to acquire related information related to the response voice.
  • the sound emitting device in the operation of causing the sound emitting device to emit sound, the sound emitting device emits a response voice representing the response and the acoustic component.
  • the sound emitting device that emits the response sound can be used as a communication device that transmits the identification information of the related information corresponding to the response sound.
  • input data representing the accepted input is transmitted to a response server, the response voice representing a response to the input represented by the input data, and identification information of related information regarding the response
  • the sound data representing the sound component representing is received from the response server, and the sound emitting device is caused to emit sound according to the received sound data.
  • the identification information is generated, and the input data and the generated identification information are transmitted to the response server.
  • the response server can manage the correspondence between the response voice and the identification information without generating the identification information in the response server.
  • the input from the user is a voice uttered by the user.
  • the related information can be acquired by the user speaking, the related information can be easily acquired by the user without input using the operation element, for example.
  • An information processing method generates a response to an input by a user, generates related information regarding the generated response, and represents an acoustic component representing acoustic information representing identification information of the related information
  • an acoustic component representing acoustic information representing identification information of the related information
  • Corresponding related information is transmitted to the terminal device.
  • the terminal device since the identification information is transmitted to the terminal device by acoustic communication using the sound emitting device, the terminal device displays the related information (for example, more detailed information about the response) related to the response represented by the response voice. Can be obtained using. Therefore, it is possible to reduce a load of the user giving a complicated operation to the terminal device in order to acquire related information related to the response voice.
  • the related information in the generation of the related information, related information corresponding to a word included in the response is generated.
  • the related information can be easily specified as compared with the configuration in which the related information corresponding to the entire response is specified.
  • a sound emission system includes a reception unit that receives input from a user, a sound emission device that emits sound, and related information related to a response to the input received by the reception unit.
  • a sound emission control unit that causes the sound emission device to emit an acoustic component representing the identification information.
  • the sound emission control unit causes the sound emitting device to emit a response voice representing the response and the acoustic component.
  • the sound emitting device that emits the response sound can be used as a communication device that transmits the identification information of the related information corresponding to the response sound.
  • a transmission unit that transmits input data representing an input accepted by the reception unit to a response server, the response voice that represents a response to the input represented by the input data, and the response
  • a receiving unit that receives from the response server acoustic data that represents the acoustic component representing the identification information of the related information related to the sound information, and the sound emission control unit is configured to release the sound according to the acoustic data received by the receiving unit.
  • an identification information generation unit that generates the identification information
  • the transmission unit receives the input data and the identification information generated by the identification information generation unit as the response.
  • the response server can manage the correspondence between the response voice and the identification information without generating the identification information in the response server.
  • the input from the user is a voice uttered by the user.
  • the related information can be acquired by the user speaking, the related information can be easily acquired by the user without input using the operation element, for example.
  • An information processing system includes a response generation unit that generates a response to an input by a user, a related information generation unit that generates related information about the response generated by the response generation unit, A first communication control unit configured to transmit acoustic data representing an acoustic component representing identification information of the related information generated by the related information generating unit to a sound emitting system that emits sound according to the sound data; A second communication control unit that transmits related information corresponding to the identification information to the terminal device in response to an information request from the terminal device that has received the identification information by acoustic communication by the system.
  • the terminal device since the identification information is transmitted to the terminal device by acoustic communication using the sound emitting device, the terminal device displays the related information (for example, more detailed information about the response) related to the response represented by the response voice. Can be obtained using. Therefore, it is possible to reduce a load of the user giving a complicated operation to the terminal device in order to acquire related information related to the response voice.
  • the related information for example, more detailed information about the response
  • the related information generation unit generates related information corresponding to a word included in the response generated by the response generation unit.
  • the related information can be easily specified as compared with the configuration in which the related information corresponding to the entire response is specified.
  • DESCRIPTION OF SYMBOLS 100 ... Information provision system, 20 ... Sound emission system, 21 ... Sound collection device, 22 ... Sound emission device, 23 ... Memory

Abstract

This sound emission system is provided with: a sound acquisition device for receiving an input from a user; a sound emission device for emitting a sound; and a sound emission control unit for emitting, to the sound emission device, a sound component that represents identification information about information related to a response to the input received by the sound acquisition device.

Description

放音システム、情報処理システム、情報提供方法および情報処理方法Sound emission system, information processing system, information providing method, and information processing method
 本開示は、端末装置に情報を提供する技術に関する。 This disclosure relates to a technique for providing information to a terminal device.
 音声により利用者に情報を提供するサービスが広く普及している。例えば特許文献1には、自動販売機を利用する利用者と対話をすることで、自動販売機の操作を補助するサービスロボットが開示されている。 Services that provide information to users by voice are widely used. For example, Patent Document 1 discloses a service robot that assists the operation of a vending machine by interacting with a user who uses the vending machine.
特開2007-11880号公報Japanese Patent Laid-Open No. 2007-11880
 しかし、特許文献1の技術では、サービスロボットは利用者に対して対話のための音声を発声するにすぎない。サービスロボットが発声する音声の内容に関する更に詳細な情報を所望する利用者は、自身が聴取した音声に関する情報を、例えば端末装置を操作することで検索サイトを利用して取得する必要がある。以上の事情を背景として、本開示は、利用者が煩雑な作業を必要とすることなく音声に関する情報を取得することを目的とする。 However, with the technology of Patent Document 1, the service robot only utters a voice for dialogue to the user. A user who desires more detailed information related to the content of the voice uttered by the service robot needs to acquire information related to the voice he / she listened to by using a search site by operating a terminal device, for example. In view of the above circumstances, the present disclosure is intended to acquire information related to voice without requiring a complicated operation by the user.
 以上の課題を解決するために、本開示の好適な態様に係る情報提供方法は、利用者からの入力を受付け、前記受付けた入力に対する応答に関する関連情報の識別情報を表す音響成分を放音装置に放音させる。
 本開示の一例に係る情報処理方法は、利用者による入力に対する応答を生成し、前記生成した応答に関する関連情報を生成し、前記関連情報に対応する識別情報を表す音響成分を表す音響データを、当該音響データに応じて放音する放音システムに対して送信すし、前記放音システムによる音響通信で前記識別情報を受信した端末装置からの情報要求に応じて、当該識別情報に対応する関連情報を当該端末装置に送信する。
 本開示の一例に係る放音システムは、利用者からの入力を受付ける受付部と、音響を放音する放音装置と、前記受付部が受付けた入力に対する応答に関する関連情報の識別情報を表す音響成分を前記放音装置に放音させる放音制御部とを具備する。
 本開示の一例に係る情報処理システムは、利用者による入力に対する応答を生成する応答生成部と、前記応答生成部が生成した応答に関する関連情報を生成する関連情報生成部と、前記関連情報生成部が生成した関連情報の識別情報を表す音響成分を表す音響データを、当該音響データに応じて放音する放音システムに対して送信する第1通信制御部と、前記放音システムによる音響通信で前記識別情報を受信した端末装置からの情報要求に応じて、当該識別情報に対応する関連情報を当該端末装置に送信する第2通信制御部とを具備する。
In order to solve the above problems, an information providing method according to a preferred aspect of the present disclosure receives an input from a user, and outputs a sound component representing identification information of related information related to a response to the received input. Let the sound be emitted.
An information processing method according to an example of the present disclosure generates a response to an input by a user, generates related information about the generated response, and stores acoustic data representing an acoustic component representing identification information corresponding to the related information. Related information corresponding to the identification information in response to an information request from the terminal device that transmits to the sound emission system that emits sound according to the acoustic data and receives the identification information by acoustic communication by the sound emission system Is transmitted to the terminal device.
A sound emission system according to an example of the present disclosure includes a reception unit that receives an input from a user, a sound emission device that emits sound, and an acoustic that represents identification information of related information related to a response to the input received by the reception unit. A sound emission control unit that causes the sound emission device to emit the component.
An information processing system according to an example of the present disclosure includes a response generation unit that generates a response to an input by a user, a related information generation unit that generates related information about a response generated by the response generation unit, and the related information generation unit A first communication control unit that transmits acoustic data representing an acoustic component representing identification information of related information generated by the sound emitting system according to the acoustic data, and acoustic communication by the sound emitting system. A second communication control unit that transmits related information corresponding to the identification information to the terminal device in response to an information request from the terminal device that has received the identification information.
第1実施形態における情報提供システムの構成を例示するブロック図である。It is a block diagram which illustrates the composition of the information service system in a 1st embodiment. 放音システムの構成を例示するブロック図である。It is a block diagram which illustrates the composition of a sound emission system. 応答サーバの構成を例示するブロック図である。It is a block diagram which illustrates the composition of a response server. 関連情報テーブルの模式図である。It is a schematic diagram of a related information table. 信号生成部の構成を例示するブロック図である。It is a block diagram which illustrates the composition of a signal generation part. 情報提供サーバの構成を例示するブロック図である。It is a block diagram which illustrates the composition of an information service server. 端末装置の構成を例示するブロック図である。It is a block diagram which illustrates the composition of a terminal unit. 情報提供システムの全体の処理を例示するフローチャートである。It is a flowchart which illustrates the whole process of an information provision system.
<第1実施形態>
 図1は、本開示の第1実施形態に係る情報提供システム100の構成を例示するブロック図である。図1に例示される通り、第1実施形態の情報提供システム100は、放音システム20と応答サーバ30と情報提供サーバ40とを具備する。情報提供システム100は、端末装置50の利用者Uに各種の情報を提供するためのコンピュータシステムである。具体的には、端末装置50の利用者Uが発音した音声(以下「発話音声」という)V1に対する応答と、当該応答に関連する情報(以下「関連情報」という)Rとが利用者Uに提供される。応答サーバ30は、例えばインターネットを含む通信網を介して、放音システム20および情報提供サーバ40と通信する。応答サーバ30は、利用者Uの発話音声V1に対する応答と、当該応答に関連する関連情報Rとを生成する。応答サーバ30が生成した応答を表す音声(以下「応答音声」という)V2が放音システム20により再生される。また、応答サーバ30が生成した関連情報Rが情報提供サーバ40により端末装置50に送信される。以下、情報提供システム100の詳細を説明する。
<First Embodiment>
FIG. 1 is a block diagram illustrating a configuration of an information providing system 100 according to the first embodiment of the present disclosure. As illustrated in FIG. 1, the information providing system 100 according to the first embodiment includes a sound emitting system 20, a response server 30, and an information providing server 40. The information providing system 100 is a computer system for providing various types of information to the user U of the terminal device 50. Specifically, a response to the voice (hereinafter referred to as “uttered voice”) V1 uttered by the user U of the terminal device 50 and information R (hereinafter referred to as “related information”) R related to the response are given to the user U. Provided. The response server 30 communicates with the sound emitting system 20 and the information providing server 40 via a communication network including the Internet, for example. The response server 30 generates a response to the utterance voice V1 of the user U and related information R related to the response. A sound V2 representing the response generated by the response server 30 (hereinafter referred to as “response sound”) is reproduced by the sound emission system 20. Further, the related information R generated by the response server 30 is transmitted to the terminal device 50 by the information providing server 40. Hereinafter, details of the information providing system 100 will be described.
<放音システム20>
 図2は、放音システム20の構成を例示するブロック図である。放音システム20は、端末装置50の利用者Uによる発話音声V1に対する応答音声V2を再生するコンピュータシステムである。利用者Uと対話する音声対話装置(いわゆるAIスピーカ)が放音システム20として利用される。例えば携帯電話機やスマートフォン等の可搬型の情報処理装置、または、パーソナルコンピュータ等の情報処理装置が放音システム20として利用される。また、動物等の外観を模擬した玩具(例えば動物のぬいぐるみ等の人形)やロボットの形態で放音システム20を実現することも可能である。例えば、駅またはバス停等の交通施設、鉄道またはバス等の交通機関、販売店または飲食店等の商業施設、旅館またはホテル等の宿泊施設、博物館または美術館等の展示施設、史跡または名所等の観光施設、競技場または体育館等の運動施設、等に放音システム20が設置される。
<Sound emission system 20>
FIG. 2 is a block diagram illustrating the configuration of the sound emission system 20. The sound emitting system 20 is a computer system that reproduces the response voice V2 to the uttered voice V1 by the user U of the terminal device 50. A voice interaction device (so-called AI speaker) that interacts with the user U is used as the sound emission system 20. For example, a portable information processing device such as a mobile phone or a smartphone, or an information processing device such as a personal computer is used as the sound emitting system 20. It is also possible to realize the sound emission system 20 in the form of a toy that simulates the appearance of an animal or the like (for example, a doll such as a stuffed animal) or a robot. For example, transportation facilities such as stations or bus stops, transportation facilities such as railways or buses, commercial facilities such as dealers or restaurants, lodging facilities such as inns or hotels, exhibition facilities such as museums or museums, sightseeing such as historic sites or sights The sound emission system 20 is installed in a facility, athletic facility such as a stadium or gymnasium.
 発話音声V1は、例えば問掛け(質問)および話掛けを含む発話の音声である。他方、応答音声V2は、問掛けに対する回答や話掛けに対する受応えを含む応答の音声である。なお、利用者Uが検索を所望する事柄に関する1以上のキーワード(例えば単語)を発話する音声を発話音声V1とし、当該キーワードに関連する事柄を検索結果として表す音声を応答音声V2としてもよい。例えば、商業施設内の飲食店の場所を質問する「近くにレストランはありますか?」という発話音声V1を利用者Uが発話すると、当該発話音声V1に対して回答する「レストランABCが近くにあります。」という応答音声V2が放音システム20から再生される。図2に例示される通り、第1実施形態の放音システム20は、収音装置21(受付部の一例)と放音装置22と記憶装置23と制御装置24と通信装置25とを具備する。 The utterance voice V1 is an utterance voice including a question (question) and a talk, for example. On the other hand, the response voice V2 is a response voice including an answer to the question and an answer to the talk. Note that a voice that utters one or more keywords (for example, words) related to a matter that the user U desires to search may be referred to as a speech voice V1, and a voice that represents a matter related to the keyword as a search result may be referred to as a response voice V2. For example, when the user U utters an utterance voice V1 that asks the location of a restaurant in a commercial facility, “Do you have a restaurant nearby?”, The answer to the utterance voice V1 is “A restaurant ABC is nearby. . ”Is reproduced from the sound emission system 20. As illustrated in FIG. 2, the sound emission system 20 of the first embodiment includes a sound collection device 21 (an example of a reception unit), a sound emission device 22, a storage device 23, a control device 24, and a communication device 25. .
 収音装置21は、周囲の音響を収音する入力機器である。第1実施形態の収音装置21は、利用者Uが発音した発話音声V1を表すデータ(以下「入力データ」という)D1を生成する。すなわち、収音装置21は、利用者Uが発音した発話音声V1(利用者Uによる入力の一例)を受付ける受付部として機能する。具体的には、収音装置21は、利用者Uが発音した発話音声V1を収音して当該発話音声V1の波形を表す信号を生成するマイクロホンと、当該信号をアナログからデジタルに変換することで入力データD1を生成するA/D変換器とを具備する。 The sound collection device 21 is an input device that collects ambient sounds. The sound collection device 21 of the first embodiment generates data (hereinafter referred to as “input data”) D1 representing the uttered voice V1 uttered by the user U. That is, the sound collection device 21 functions as a reception unit that receives the uttered voice V1 (an example of an input by the user U) pronounced by the user U. Specifically, the sound collection device 21 collects the utterance voice V1 sounded by the user U and generates a signal representing the waveform of the utterance voice V1, and converts the signal from analog to digital. And an A / D converter for generating input data D1.
 制御装置24(コンピュータの例示)は、例えばCPU(Central Processing Unit)等の処理回路で構成され、放音システム20の各要素を統括的に制御する。記憶装置23は、制御装置24が実行するプログラムと、制御装置24が使用する各種のデータとを記憶する。例えば半導体記録媒体および磁気記録媒体等の公知の記録媒体、または複数種の記録媒体の組合せが、記憶装置23として任意に採用される。 The control device 24 (an example of a computer) is constituted by a processing circuit such as a CPU (Central Processing Unit), and controls each element of the sound emission system 20 in an integrated manner. The storage device 23 stores a program executed by the control device 24 and various data used by the control device 24. For example, a known recording medium such as a semiconductor recording medium and a magnetic recording medium, or a combination of a plurality of types of recording media is arbitrarily adopted as the storage device 23.
 制御装置24は、図2に例示される通り、記憶装置23に記憶されたプログラムを実行することで複数の機能(通信制御部243および放音制御部245)を実現する。なお、制御装置24の一部の機能を専用の電子回路で実現してもよい。また、制御装置24の機能を複数の装置に搭載してもよい。 The control device 24 realizes a plurality of functions (communication control unit 243 and sound emission control unit 245) by executing a program stored in the storage device 23 as illustrated in FIG. Note that some functions of the control device 24 may be realized by a dedicated electronic circuit. Further, the function of the control device 24 may be installed in a plurality of devices.
 通信制御部243は、通信装置25により、各種の情報を受信および送信する。すなわち、通信制御部243は、各種の情報を受信および送信するように通信装置25を制御する。第1に、通信制御部243は、収音装置21が生成した入力データD1を、通信装置25により応答サーバ30に送信する。入力データD1を受信した応答サーバ30は、当該入力データD1が表す発話音声V1に対する応答音声V2を放音システム20に放音させるためのデータ(以下「音響データ」という)D2を生成する。第2に、通信制御部243は、応答サーバ30が生成した音響データD2を、通信装置25により応答サーバ30から受信する。放音制御部245は、応答サーバ30から送信された音響データD2に応じた音響を放音装置22に放音させる。 The communication control unit 243 receives and transmits various types of information through the communication device 25. That is, the communication control unit 243 controls the communication device 25 so as to receive and transmit various types of information. First, the communication control unit 243 transmits the input data D1 generated by the sound collection device 21 to the response server 30 via the communication device 25. Upon receiving the input data D1, the response server 30 generates data (hereinafter referred to as “acoustic data”) D2 for causing the sound emission system 20 to emit the response voice V2 corresponding to the utterance voice V1 represented by the input data D1. Secondly, the communication control unit 243 receives the acoustic data D2 generated by the response server 30 from the response server 30 by the communication device 25. The sound emission control unit 245 causes the sound emission device 22 to emit sound according to the acoustic data D2 transmitted from the response server 30.
 通信装置25は、通信制御部243による制御のもとで通信網を介して応答サーバ30と相互に通信する通信機器である。具体的には、通信装置25は、送信部251と受信部253とを具備する。送信部251は、収音装置21が収音した発話音声V1を表す入力データD1を応答サーバ30に送信する。受信部253は、応答サーバ30が生成した音響データD2を受信する。放音装置22は、各種の音響を放音する出力装置である。具体的には、放音装置22は、放音制御部245による制御のもとで、通信装置25が受信した音響データD2に応じた音響を放音する。すなわち、音響データD2が表す応答音声V2が放音装置22により放音される。したがって、発話音声V1を発音した利用者Uは、当該発話音声V1に対する応答音声V2を聴取することが可能である。 The communication device 25 is a communication device that communicates with the response server 30 via the communication network under the control of the communication control unit 243. Specifically, the communication device 25 includes a transmission unit 251 and a reception unit 253. The transmission unit 251 transmits input data D1 representing the uttered voice V1 collected by the sound collection device 21 to the response server 30. The receiving unit 253 receives the acoustic data D2 generated by the response server 30. The sound emitting device 22 is an output device that emits various sounds. Specifically, the sound emitting device 22 emits sound according to the acoustic data D2 received by the communication device 25 under the control of the sound emission control unit 245. That is, the response voice V2 represented by the acoustic data D2 is emitted by the sound emission device 22. Therefore, the user U who pronounced the uttered voice V1 can listen to the response voice V2 corresponding to the uttered voice V1.
<応答サーバ30>
 図3は、応答サーバ30の構成を例示するブロック図である。応答サーバ30は、利用者Uの発話音声V1に対する応答と、当該応答に関する関連情報Rとを生成するコンピュータシステムである。具体的には、応答サーバ30は、記憶装置31と制御装置32と通信装置33とを具備する。
<Response server 30>
FIG. 3 is a block diagram illustrating the configuration of the response server 30. The response server 30 is a computer system that generates a response to the utterance voice V1 of the user U and related information R related to the response. Specifically, the response server 30 includes a storage device 31, a control device 32, and a communication device 33.
 記憶装置31は、制御装置32が実行するプログラムと、制御装置32が使用する各種のデータとを記憶する。例えば半導体記録媒体および磁気記録媒体等の公知の記録媒体、または複数種の記録媒体の組合せが、記憶装置31として任意に採用される。具体的には、記憶装置31は、関連情報テーブルを記憶する。関連情報テーブルは、発話音声V1に対する応答の関連情報Rを特定するために利用されるデータテーブルである。関連情報テーブルの詳細については後述する。 The storage device 31 stores a program executed by the control device 32 and various data used by the control device 32. For example, a known recording medium such as a semiconductor recording medium and a magnetic recording medium, or a combination of a plurality of types of recording media is arbitrarily adopted as the storage device 31. Specifically, the storage device 31 stores a related information table. The related information table is a data table used for specifying the related information R of the response to the speech voice V1. Details of the related information table will be described later.
 制御装置32(コンピュータの例示)は、例えばCPU(Central Processing Unit)等の処理回路で構成され、放音システム20の各要素を統括的に制御する。図2に例示される通り、制御装置32は、記憶装置31に記憶されたプログラムを実行することで複数の機能(音声認識部321、応答生成部322、関連情報生成部323、識別情報生成部324、信号生成部325および通信制御部326)を実現する。なお、制御装置32の一部の機能を専用の電子回路で実現してもよい。また、制御装置32の機能を複数の装置に搭載してもよい。 The control device 32 (an example of a computer) is constituted by a processing circuit such as a CPU (Central Processing Unit), and controls each element of the sound emission system 20 in an integrated manner. As illustrated in FIG. 2, the control device 32 executes a program stored in the storage device 31 to execute a plurality of functions (voice recognition unit 321, response generation unit 322, related information generation unit 323, identification information generation unit). 324, the signal generation unit 325, and the communication control unit 326). Note that some functions of the control device 32 may be realized by a dedicated electronic circuit. Further, the function of the control device 32 may be installed in a plurality of devices.
 音声認識部321は、放音システム20から送信された入力データD1に対する音声認識により、発話音声V1の発話内容を表す文字列(以下「発話文字列」という)を特定する。例えば、レストランの場所を質問する内容の発話音声V1を利用者Uが発音した場合には、「レストランは近くにありますか?」という発話文字列が特定される。入力データD1に対する音声認識には、例えばHMM(Hidden Markov Model)等の音響モデルと、言語的な制約を示す言語モデルとを利用した認識処理等の公知の技術が任意に採用される。 The voice recognition unit 321 specifies a character string representing the utterance content of the uttered voice V1 (hereinafter referred to as “uttered character string”) by voice recognition on the input data D1 transmitted from the sound emitting system 20. For example, when the user U pronounces the utterance voice V1 that asks the location of the restaurant, the utterance character string “Is the restaurant nearby?” Is specified. For voice recognition with respect to the input data D1, for example, a known technique such as a recognition process using an acoustic model such as HMM (Hidden Markov Model) and a language model indicating linguistic restrictions is arbitrarily adopted.
 応答生成部322は、発話音声V1に対する応答を生成する。具体的には、応答生成部322は、音声認識部321が特定した発話文字列に対する応答を表す文字列(以下「応答文字列」という)を生成する。例えば「レストランは近くにありますか?」という発話文字列が特定された場合には、レストランABCの所在を表す「レストランABCが近くにあります。」という応答文字列が特定される。応答文字列の生成には、発話文字列に対する形態素解析等の自然言語処理および人工知能を利用した対話技術等の公知の技術が任意に採用される。 The response generation unit 322 generates a response to the uttered voice V1. Specifically, the response generation unit 322 generates a character string (hereinafter referred to as “response character string”) representing a response to the utterance character string specified by the voice recognition unit 321. For example, when an utterance character string “is a restaurant nearby?” Is specified, a response character string “the restaurant ABC is nearby” indicating the location of the restaurant ABC is specified. For the generation of the response character string, known techniques such as natural language processing such as morphological analysis for the utterance character string and a dialogue technique using artificial intelligence are arbitrarily employed.
 関連情報生成部323は、応答生成部322が生成した応答に関する関連情報Rを生成する。関連情報Rは、例えば応答の内容を補足するためのコンテンツである。例えば応答文字列に含まれる特定の単語(以下「応答単語」という)の内容を補足するためのコンテンツが関連情報Rとして例示される。応答単語は、例えば応答文字列に含まれる単語のうち固有名詞等の特徴的な単語である。応答文字列「レストランABCが近くにあります。」に含まれる応答単語は、「レストランABC」である。応答単語が表す事柄を説明する情報(例えばホームページのURL)、応答単語が表す事柄の所在を示す情報(例えば地図画像、地図のURLまたは所在を示す文字列)等の各種のコンテンツが関連情報Rとして例示される。例えば、応答単語が表す事柄が飲食店の場合には、当該飲食店のメニューや混雑情報を知らせるコンテンツを関連情報Rとしてもよい。なお、関連情報Rは、以上の例示に限定されず、応答単語の内容や種類に応じて任意に変更される。応答単語の抽出には、例えば形態素解析等の公知の自然言語処理が任意に採用される。 The related information generation unit 323 generates related information R related to the response generated by the response generation unit 322. The related information R is, for example, content for supplementing response details. For example, the content for supplementing the content of a specific word (hereinafter referred to as “response word”) included in the response character string is exemplified as the related information R. The response word is a characteristic word such as a proper noun among words included in the response character string. The response word contained in the response character string “Restaurant ABC is nearby” is “Restaurant ABC”. Various contents such as information (for example, URL of a homepage) describing the matter represented by the response word, information indicating the location of the matter represented by the response word (for example, a map image, a URL of the map, or a character string indicating the location) are related information R As an example. For example, when the matter represented by the response word is a restaurant, the content that informs the restaurant menu or congestion information may be used as the related information R. Note that the related information R is not limited to the above example, and is arbitrarily changed according to the content and type of the response word. For extracting the response word, known natural language processing such as morphological analysis is arbitrarily employed.
 関連情報Rの生成には、関連情報テーブルが利用される。図4は、関連情報テーブルの模式図である。関連情報テーブルは、複数の関連情報Rが登録されたテーブルである。具体的には、複数の応答単語の各々について、当該応答単語に対応する関連情報Rが登録される。 The related information table is used to generate the related information R. FIG. 4 is a schematic diagram of a related information table. The related information table is a table in which a plurality of related information R is registered. Specifically, for each of a plurality of response words, related information R corresponding to the response word is registered.
 関連情報生成部323は、応答生成部322が生成した応答文字列から応答単語を抽出し、関連情報テーブルに登録された複数の関連情報Rのうち当該応答単語に対応する関連情報Rを特定する。以上の説明から理解される通り、応答生成部322が生成した応答文字列の応答単語に対応する関連情報Rが生成される。なお、応答単語に対して複数の関連情報Rを生成してもよい。 The related information generation unit 323 extracts a response word from the response character string generated by the response generation unit 322, and specifies the related information R corresponding to the response word among the plurality of related information R registered in the related information table. . As understood from the above description, the related information R corresponding to the response word of the response character string generated by the response generation unit 322 is generated. A plurality of related information R may be generated for the response word.
 図3の識別情報生成部324は、関連情報生成部323が生成した関連情報Rを識別するための識別情報Iを生成する。関連情報テーブルに登録された複数の関連情報Rの各々について相異なる識別情報Iが生成される。なお、各関連情報Rについて事前に生成した識別情報Iを当該関連情報Rに対応付けて関連情報テーブルに予め登録してもよい。 3 generates identification information I for identifying the related information R generated by the related information generating unit 323. The identification information generating unit 324 of FIG. Different identification information I is generated for each of the plurality of related information R registered in the related information table. The identification information I generated in advance for each related information R may be registered in advance in the related information table in association with the related information R.
 信号生成部325は、応答生成部322が生成した応答を表す応答音声V2と、関連情報生成部323が生成した関連情報Rに対応する識別情報Iの音響成分とを表す音響データD2を生成する。具体的には、応答音声V2と識別情報Iの音響成分との混合音を表す音響データD2が生成される。図5は、信号生成部325のブロック図である。図5に例示される通り、信号生成部325は、音声合成部71と変調処理部73と加算部74とを具備する。音声合成部71は、応答生成部322が生成した応答文字列に対する音声合成で音声信号を生成する。音声信号の生成には、公知の音声合成技術が任意に採用される。 The signal generation unit 325 generates acoustic data D2 representing the response voice V2 representing the response generated by the response generation unit 322 and the acoustic component of the identification information I corresponding to the related information R generated by the related information generation unit 323. . Specifically, acoustic data D2 representing a mixed sound of the response voice V2 and the acoustic component of the identification information I is generated. FIG. 5 is a block diagram of the signal generation unit 325. As illustrated in FIG. 5, the signal generation unit 325 includes a speech synthesis unit 71, a modulation processing unit 73, and an addition unit 74. The voice synthesis unit 71 generates a voice signal by voice synthesis with respect to the response character string generated by the response generation unit 322. A known speech synthesis technique is arbitrarily employed for generating the speech signal.
 変調処理部73は、識別情報生成部324が生成した識別情報Iの音響成分を表す変調信号を生成する。変調信号は、例えば所定の周波数の搬送波を識別情報Iにより周波数変調することで生成される。なお、拡散符号を利用した各情報の拡散変調と所定の周波数の搬送波を利用した周波数変換とを順次に実行することで変調信号を生成してもよい。変調信号の周波数帯域は、放音装置22による放音と端末装置50による収音とが可能な周波数帯域であり、かつ、端末装置50の利用者Uが通常の環境で聴取する音声の周波数帯域を上回る周波数帯域(例えば18kHz以上かつ20kHz以下)に設定される。したがって、利用者Uは、識別情報Iの音響成分を殆ど聴取できない。ただし、変調信号の周波数帯域は任意であり、例えば可聴帯域内の変調信号を生成することも可能である。 The modulation processing unit 73 generates a modulation signal representing the acoustic component of the identification information I generated by the identification information generation unit 324. The modulation signal is generated, for example, by frequency-modulating a carrier wave having a predetermined frequency with the identification information I. Note that the modulation signal may be generated by sequentially executing spread modulation of each information using a spread code and frequency conversion using a carrier wave of a predetermined frequency. The frequency band of the modulated signal is a frequency band that can be emitted by the sound emitting device 22 and collected by the terminal device 50, and the frequency band of the sound that the user U of the terminal device 50 listens to in a normal environment. Is set to a frequency band exceeding (for example, 18 kHz or more and 20 kHz or less). Therefore, the user U can hardly hear the acoustic component of the identification information I. However, the frequency band of the modulation signal is arbitrary, and for example, a modulation signal within an audible band can be generated.
 加算部74は、音声合成部71が生成した音声信号と、変調処理部73が生成した変調信号とを加算することで、音響データD2を生成する。 The adding unit 74 adds the audio signal generated by the audio synthesizing unit 71 and the modulation signal generated by the modulation processing unit 73 to generate acoustic data D2.
 図3の通信制御部326(第1通信制御部の例示)は、通信装置33により各種の情報を受信および送信する。すなわち、通信制御部326は、各種の情報を受信および送信するように通信装置33を制御する。第1に、通信制御部326は、放音システム20から送信された入力データD1を通信装置33により受信する。第2に、通信制御部326は、信号生成部325が生成した音響データD2を通信装置33により放音システム20に送信する。第3に、通信制御部326は、関連情報生成部323が生成した関連情報Rと、識別情報生成部324が当該関連情報Rについて生成した識別情報Iとを含むデータ(以下「提供データ」という)D3を、通信装置33により情報提供サーバ40に送信する。 The communication control unit 326 (illustrated as the first communication control unit) in FIG. That is, the communication control unit 326 controls the communication device 33 so as to receive and transmit various types of information. First, the communication control unit 326 receives the input data D 1 transmitted from the sound emission system 20 by the communication device 33. Secondly, the communication control unit 326 transmits the acoustic data D <b> 2 generated by the signal generation unit 325 to the sound emission system 20 through the communication device 33. Thirdly, the communication control unit 326 includes data (hereinafter referred to as “provided data”) including the related information R generated by the related information generating unit 323 and the identification information I generated by the identification information generating unit 324 regarding the related information R. ) D3 is transmitted to the information providing server 40 by the communication device 33.
 通信装置33は、通信制御部326による制御のもとで通信網を介して放音システム20および情報提供サーバ40の各々と相互に通信する。具体的には、通信装置33は、送信部331と受信部333とを含む。受信部333は、放音システム20から送信された入力データD1を受信する。送信部331は、信号生成部325が生成した音響データD2を放音システム20に対して送信し、提供データD3を情報提供サーバ40に対して送信する。 The communication device 33 communicates with each of the sound emitting system 20 and the information providing server 40 via the communication network under the control of the communication control unit 326. Specifically, the communication device 33 includes a transmission unit 331 and a reception unit 333. The receiving unit 333 receives the input data D1 transmitted from the sound emission system 20. The transmission unit 331 transmits the acoustic data D2 generated by the signal generation unit 325 to the sound emitting system 20, and transmits the provision data D3 to the information providing server 40.
 音響データD2を受信した放音システム20の放音制御部245は、当該音響データD2に応じて放音装置22に放音させる。具体的には、音響データD2を放音装置22に供給することで、当該音響データD2が表す混合音が放音装置22から放音される。すなわち、利用者Uの発話音声V1に対する応答音声V2と、当該応答音声V2が表す応答に関する関連情報Rの識別情報Iの音響成分とが放音装置22から放音される。 The sound emission control unit 245 of the sound emission system 20 that has received the sound data D2 causes the sound emission device 22 to emit sound according to the sound data D2. Specifically, by supplying the acoustic data D2 to the sound emitting device 22, the mixed sound represented by the acoustic data D2 is emitted from the sound emitting device 22. That is, the response sound V2 to the utterance voice V1 of the user U and the acoustic component of the identification information I of the related information R related to the response represented by the response voice V2 are emitted from the sound emission device 22.
 以上の説明から理解される通り、放音装置22は、応答音声V2を再生する音響機器として機能するほか、空気振動としての音波を伝送媒体とした音響通信により識別情報Iを周囲に送信する送信機としても機能する。すなわち、応答音声V2を放音する放音装置22から識別情報Iの音響を放音する音響通信により、当該識別情報Iが周囲に送信される。識別情報Iは、応答音声V2の放音毎に送信される。例えば、応答音声V2の放音とともに(例えば応答音声V2の放音に並行または前後して)識別情報Iが送信される。 As understood from the above description, the sound emitting device 22 functions as an acoustic device that reproduces the response voice V2, and transmits the identification information I to the surroundings by acoustic communication using sound waves as air vibration as a transmission medium. It also functions as a machine. That is, the identification information I is transmitted to the surroundings by acoustic communication that emits the sound of the identification information I from the sound emitting device 22 that emits the response voice V2. The identification information I is transmitted every time the response voice V2 is emitted. For example, the identification information I is transmitted together with the sound of the response voice V2 (for example, in parallel with or before the sound of the response voice V2 is emitted).
<情報提供サーバ40>
 図6は、情報提供サーバ40のブロック図である。情報提供サーバ40は、利用者Uの発話音声V1に対する応答に関する関連情報Rを端末装置50に送信するためのコンピュータシステムである。図6に例示される通り、情報提供サーバ40は、記憶装置41と制御装置42と通信装置43とを具備する。
<Information providing server 40>
FIG. 6 is a block diagram of the information providing server 40. The information providing server 40 is a computer system for transmitting related information R related to the response of the user U to the uttered voice V1 to the terminal device 50. As illustrated in FIG. 6, the information providing server 40 includes a storage device 41, a control device 42, and a communication device 43.
 記憶装置41は、制御装置42が実行するプログラムと、制御装置42が使用する各種のデータとを記憶する。例えば半導体記録媒体および磁気記録媒体等の公知の記録媒体、または複数種の記録媒体の組合せが、記憶装置41として任意に採用される。具体的には、記憶装置41は、情報提供テーブルを記憶する。情報提供テーブルは、発話音声V1に対する応答の関連情報Rを端末装置50に提供するために利用されるデータテーブルである。具体的には、応答サーバ30から送信された提供データD3に含まれる識別情報Iと関連情報Rとが相互に対応した状態で情報提供テーブルに登録される。なお、利用者Uからの発話音声V1毎に提供データD3が生成される。 The storage device 41 stores a program executed by the control device 42 and various data used by the control device 42. For example, a known recording medium such as a semiconductor recording medium and a magnetic recording medium, or a combination of a plurality of types of recording media is arbitrarily employed as the storage device 41. Specifically, the storage device 41 stores an information provision table. The information provision table is a data table used for providing the terminal device 50 with the related information R of the response to the speech voice V1. Specifically, the identification information I and the related information R included in the provision data D3 transmitted from the response server 30 are registered in the information provision table in a state where they correspond to each other. Provided data D3 is generated for each utterance voice V1 from the user U.
 制御装置42(コンピュータの例示)は、例えばCPU(Central Processing Unit)等の処理回路で構成され、放音システム20の各要素を統括的に制御する。図2に例示される通り、具体的には、制御装置42は、記憶装置41に記憶されたプログラムを実行することで複数の機能(記憶制御部421、関連情報特定部423および通信制御部425)を実現する。なお、制御装置42の一部の機能を専用の電子回路で実現してもよい。また、制御装置42の機能を複数の装置に搭載してもよい。 The control device 42 (an example of a computer) is configured by a processing circuit such as a CPU (Central Processing Unit), for example, and comprehensively controls each element of the sound emission system 20. Specifically, as illustrated in FIG. 2, the control device 42 executes a program stored in the storage device 41 to thereby execute a plurality of functions (a storage control unit 421, a related information specifying unit 423, and a communication control unit 425. ). Note that some functions of the control device 42 may be realized by a dedicated electronic circuit. Further, the function of the control device 42 may be installed in a plurality of devices.
 記憶制御部421は、通信装置43が受信した提供データD3を記憶装置41に記憶させる。具体的には、記憶制御部421は、提供データD3に含まれる識別情報Iと関連情報Rとを対応させて情報提供テーブルに登録する。 The storage control unit 421 stores the provided data D3 received by the communication device 43 in the storage device 41. Specifically, the storage control unit 421 registers the identification information I and the related information R included in the provision data D3 in association with each other in the information provision table.
 関連情報特定部423は、放音システム20による音響通信で識別情報Iを受信した端末装置50からの情報要求に応じて、当該識別情報Iに対応する関連情報Rを特定する。端末装置50からの情報要求には、識別情報Iが含まれる。具体的には、関連情報特定部423は、情報提供テーブルに登録された複数の関連情報Rのうち、端末装置50からの情報要求に含まれる識別情報Iに対応する関連情報Rを情報提供テーブルから特定する。 The related information specifying unit 423 specifies related information R corresponding to the identification information I in response to an information request from the terminal device 50 that has received the identification information I by acoustic communication by the sound emission system 20. The information request from the terminal device 50 includes identification information I. Specifically, the related information specifying unit 423 displays the related information R corresponding to the identification information I included in the information request from the terminal device 50 among the plurality of related information R registered in the information providing table. Identify from.
 通信制御部425(第2通信制御部の例示)は、通信装置43により各種の情報を受信および送信する。すなわち、通信制御部425は、各種の情報を受信および送信するように通信装置43を制御する。第1に、通信制御部425は、応答サーバ30から送信された提供データD3を、通信装置43により受信する。第2に、通信制御部425は、放音システム20による音響通信で識別情報Iを受信した端末装置50からの情報要求に応じて、当該識別情報Iに対応する関連情報R(すなわち関連情報特定部423が特定した関連情報R)を、通信装置43により当該端末装置50に送信する。 The communication control unit 425 (an example of the second communication control unit) receives and transmits various types of information through the communication device 43. That is, the communication control unit 425 controls the communication device 43 so as to receive and transmit various types of information. First, the communication control unit 425 receives the provision data D <b> 3 transmitted from the response server 30 by the communication device 43. Second, the communication control unit 425 responds to an information request from the terminal device 50 that has received the identification information I by acoustic communication by the sound emission system 20, and the related information R corresponding to the identification information I (that is, related information identification). The related information R) specified by the unit 423 is transmitted to the terminal device 50 by the communication device 43.
 通信装置43は、通信制御部425による制御のもとで通信網を介して応答サーバ30および端末装置50の各々と相互に通信する。具体的には、通信装置43は、送信部431と受信部433とを含む。受信部433は、応答サーバ30から送信された提供データD3を受信する。送信部431は、端末装置50に対して関連情報Rを送信する。なお、応答サーバ30と情報提供サーバ40とは、利用者Uの発話音声V1に対する応答と、当該応答に関する関連情報Rとを生成する情報処理システムとして機能する。 The communication device 43 communicates with each of the response server 30 and the terminal device 50 via the communication network under the control of the communication control unit 425. Specifically, the communication device 43 includes a transmission unit 431 and a reception unit 433. The receiving unit 433 receives the provision data D3 transmitted from the response server 30. The transmission unit 431 transmits the related information R to the terminal device 50. The response server 30 and the information providing server 40 function as an information processing system that generates a response to the utterance voice V1 of the user U and related information R related to the response.
<端末装置50>
 図7は、端末装置50のブロック図である。端末装置50は、放音システム20の付近に所在する。端末装置50は、利用者Uが発話した発話音声V1に対する応答に関連する関連情報Rを、情報提供サーバ40から取得するための可搬型の情報端末である。例えば携帯電話機、スマートフォン、タブレット端末、またはパーソナルコンピュータ等が端末装置50として利用される。
<Terminal device 50>
FIG. 7 is a block diagram of the terminal device 50. The terminal device 50 is located near the sound emission system 20. The terminal device 50 is a portable information terminal for acquiring, from the information providing server 40, related information R related to a response to the uttered voice V1 uttered by the user U. For example, a mobile phone, a smartphone, a tablet terminal, a personal computer, or the like is used as the terminal device 50.
 図7に例示される通り、端末装置50は、収音装置51と制御装置52と記憶装置53と通信装置54と再生装置55とを具備する。収音装置51は、周囲の音響を収音する音響機器(マイクロホン)である。具体的には、収音装置51は、放音システム20が音響データD2に応じて放音した音響を収音し、当該音響の波形を表す音響信号Yを生成する。したがって、放音システム20の付近での収音により生成された音響信号Yには、識別情報Iの音響成分が含まれ得る。 As illustrated in FIG. 7, the terminal device 50 includes a sound collection device 51, a control device 52, a storage device 53, a communication device 54, and a playback device 55. The sound collection device 51 is an acoustic device (microphone) that collects surrounding sounds. Specifically, the sound collection device 51 collects the sound emitted by the sound emission system 20 according to the acoustic data D2, and generates an acoustic signal Y representing the waveform of the sound. Therefore, the acoustic signal Y generated by sound collection in the vicinity of the sound emission system 20 can include the acoustic component of the identification information I.
 以上の説明から理解される通り、収音装置51は、端末装置50の相互間の音声通話または動画撮影時の音声収録に利用されるほか、空気振動としての音波を伝送媒体とする音響通信により識別情報Iを受信する受信機としても機能する。なお、収音装置51が生成した音響信号Yをアナログからデジタルに変換するA/D変換器の図示は便宜的に省略した。また、端末装置50と一体に構成された収音装置51に代えて、別体の収音装置51を有線または無線により端末装置50に接続してもよい。 As will be understood from the above description, the sound collection device 51 is used for voice communication between the terminal devices 50 or voice recording at the time of moving image shooting, and by acoustic communication using sound waves as air vibration as a transmission medium. It also functions as a receiver that receives the identification information I. Note that an A / D converter that converts the acoustic signal Y generated by the sound pickup device 51 from analog to digital is not shown for convenience. Further, instead of the sound collecting device 51 configured integrally with the terminal device 50, a separate sound collecting device 51 may be connected to the terminal device 50 by wire or wirelessly.
 制御装置52(コンピュータの例示)は、例えばCPU(Central Processing Unit)等の処理回路で構成され、端末装置50の各要素を統括的に制御する。記憶装置53は、制御装置52が実行するプログラムと、制御装置52が使用する各種のデータとを記憶する。例えば半導体記録媒体および磁気記録媒体等の公知の記録媒体、または複数種の記録媒体の組合せが、記憶装置53として任意に採用され得る。 The control device 52 (an example of a computer) is configured by a processing circuit such as a CPU (Central Processing Unit) and controls each element of the terminal device 50 in an integrated manner. The storage device 53 stores a program executed by the control device 52 and various data used by the control device 52. For example, a known recording medium such as a semiconductor recording medium and a magnetic recording medium, or a combination of a plurality of types of recording media can be arbitrarily employed as the storage device 53.
 制御装置52は、図7に例示される通り、記憶装置53に記憶されたプログラムを実行することで複数の機能(情報抽出部521および再生制御部523)を実現する。なお、制御装置52の一部の機能を専用の電子回路で実現してもよい。また、制御装置52の機能を複数の装置に搭載してもよい。 7, the control device 52 realizes a plurality of functions (the information extraction unit 521 and the reproduction control unit 523) by executing a program stored in the storage device 53, as illustrated in FIG. Note that some functions of the control device 52 may be realized by a dedicated electronic circuit. Further, the function of the control device 52 may be mounted on a plurality of devices.
 情報抽出部521は、収音装置51が生成した音響信号Yから識別情報Iを抽出する。具体的には、情報抽出部521は、例えば、音響信号Yのうち識別情報Iの音響成分を含む周波数帯域を強調するフィルタ処理と、識別情報Iに対する変調処理に対応した復調処理とにより、識別情報Iを抽出する。情報抽出部521が抽出した識別情報Iは、当該識別情報Iに対応する関連情報R(すなわち放音装置22により放音された応答音声V2が表す応答に関する関連情報R)の取得に利用される。 The information extraction unit 521 extracts the identification information I from the acoustic signal Y generated by the sound collection device 51. Specifically, the information extraction unit 521 performs identification by, for example, filtering processing that emphasizes the frequency band including the acoustic component of the identification information I in the acoustic signal Y and demodulation processing corresponding to the modulation processing on the identification information I. Information I is extracted. The identification information I extracted by the information extraction unit 521 is used to acquire related information R corresponding to the identification information I (that is, related information R related to the response represented by the response voice V2 emitted by the sound emitting device 22). .
 なお、識別情報Iを受信できるのは当該識別情報Iに対応する応答音声V2を収音可能な範囲内の位置に制限されるから、識別情報Iは、端末装置50の位置を示す情報とも表現できる。したがって、放音システム20の周囲に位置する端末装置50に限定して、関連情報Rを提供できる。 Since the identification information I can be received only at a position within the range where the response voice V2 corresponding to the identification information I can be collected, the identification information I is also expressed as information indicating the position of the terminal device 50. it can. Therefore, the related information R can be provided only for the terminal devices 50 located around the sound emission system 20.
 通信装置54は、制御装置52による制御のもとで通信網を介して情報提供サーバ40と通信する。具体的には、通信装置54は、情報抽出部521が抽出した識別情報Iを情報提供サーバ40に送信する。情報提供サーバ40は、端末装置50から送信された識別情報Iに対応した関連情報Rを取得して端末装置50に送信する。通信装置54は、情報提供サーバ40から送信された関連情報Rを受信する。 The communication device 54 communicates with the information providing server 40 via the communication network under the control of the control device 52. Specifically, the communication device 54 transmits the identification information I extracted by the information extraction unit 521 to the information providing server 40. The information providing server 40 acquires related information R corresponding to the identification information I transmitted from the terminal device 50 and transmits it to the terminal device 50. The communication device 54 receives the related information R transmitted from the information providing server 40.
 再生制御部523は、通信装置54が受信した関連情報Rを再生装置55に再生させる。再生装置55は、関連情報Rを再生する出力機器である。具体的には、再生装置55は、関連情報Rが表す画像を表示する表示装置を含む。なお、端末装置50と一体に構成された再生装置55に代えて、別体の再生装置55を有線または無線により端末装置50に接続してもよい。また、当該関連情報Rが表す音響を放音する放音装置を再生装置55が含んでもよい。すなわち、再生装置55による再生は、画像の表示と音響の放音とを包含する。 The reproduction control unit 523 causes the reproduction device 55 to reproduce the related information R received by the communication device 54. The playback device 55 is an output device that plays back the related information R. Specifically, the playback device 55 includes a display device that displays an image represented by the related information R. Instead of the playback device 55 configured integrally with the terminal device 50, a separate playback device 55 may be connected to the terminal device 50 by wire or wirelessly. Further, the playback device 55 may include a sound emitting device that emits sound represented by the related information R. That is, the reproduction by the reproduction device 55 includes image display and sound emission.
 図8は、情報提供システム100全体の処理のフローチャートである。利用者Uによる発話音声V1の発音を契機として図9の処理が開始される。放音システム20の収音装置21は、利用者Uからの発話音声V1を受付ける(Sa1)。具体的には、利用者Uが発話した発話音声V1を表す入力データD1が収音装置21により生成される。放音システム20の通信制御部243は、収音装置21が生成した入力データD1を、通信装置25により応答サーバ30に送信する(Sa2)。 FIG. 8 is a flowchart of processing of the information providing system 100 as a whole. The process of FIG. 9 is started when the user U pronounces the uttered voice V1. The sound collection device 21 of the sound emission system 20 receives the uttered voice V1 from the user U (Sa1). Specifically, input data D1 representing the uttered voice V1 uttered by the user U is generated by the sound collecting device 21. The communication control unit 243 of the sound emission system 20 transmits the input data D1 generated by the sound collection device 21 to the response server 30 through the communication device 25 (Sa2).
 応答サーバ30の通信制御部326は、放音システム20から送信された入力データD1を、通信装置33により受信する(Sa3)。音声認識部321は、通信装置33が受信した入力データD1に対する音声認識により発話文字列を特定する(Sa4)。応答生成部322は、発話音声V1に対する応答を生成する(Sa5)。具体的には、音声認識部321が特定した発話文字列に対応する応答文字列が生成される。関連情報生成部323は、応答生成部322が生成した応答に関する関連情報Rを生成する(Sa6)。識別情報生成部324は、関連情報生成部323が生成した関連情報Rを識別するための識別情報Iを生成する(Sa7)。信号生成部325は、音響データD2を生成する(Sa8)。具体的には、応答音声V2と識別情報Iの音響成分との混合音を表す音響データD2が生成される。通信制御部326は、提供データD3を、通信装置33により情報提供サーバ40に送信する(Sa9)。提供データD3は、関連情報生成部323が生成した関連情報Rと、識別情報生成部324が当該関連情報Rについて生成した識別情報Iとを含む。 The communication control unit 326 of the response server 30 receives the input data D1 transmitted from the sound emitting system 20 by the communication device 33 (Sa3). The voice recognition unit 321 specifies an utterance character string by voice recognition on the input data D1 received by the communication device 33 (Sa4). The response generation unit 322 generates a response to the uttered voice V1 (Sa5). Specifically, a response character string corresponding to the utterance character string specified by the voice recognition unit 321 is generated. The related information generating unit 323 generates related information R related to the response generated by the response generating unit 322 (Sa6). The identification information generation unit 324 generates identification information I for identifying the related information R generated by the related information generation unit 323 (Sa7). The signal generator 325 generates acoustic data D2 (Sa8). Specifically, acoustic data D2 representing a mixed sound of the response voice V2 and the acoustic component of the identification information I is generated. The communication control unit 326 transmits the provision data D3 to the information provision server 40 through the communication device 33 (Sa9). The provided data D3 includes related information R generated by the related information generating unit 323 and identification information I generated by the identification information generating unit 324 regarding the related information R.
 情報提供サーバ40の通信制御部425は、応答サーバ30から送信された提供データD3を、通信装置43により受信する(Sa10)。記憶制御部421は、通信装置43が受信した提供データD3を記憶装置41に記憶する(Sa11)。具体的には、記憶制御部421は、提供データD3に含まれる関連情報Rと識別情報Iとを対応させて記憶装置41に格納する。 The communication control unit 425 of the information providing server 40 receives the provision data D3 transmitted from the response server 30 by the communication device 43 (Sa10). The storage control unit 421 stores the provided data D3 received by the communication device 43 in the storage device 41 (Sa11). Specifically, the storage control unit 421 stores the related information R and the identification information I included in the provided data D3 in the storage device 41 in association with each other.
 応答サーバ30の通信制御部326は、信号生成部325が生成した音響データD2を、通信装置33により放音システム20に送信する(Sa12)。放音システム20の通信制御部243は、応答サーバ30から送信された音響データD2を、通信装置25により受信する(Sa13)。放音制御部245は、音響データD2に応じて放音装置22に放音させる(Sa14)。放音装置22は、応答音声V2と識別情報Iの音響成分との混合音の放音により、識別情報Iを端末装置50に送信する(Sa15)。すなわち、放音装置22を利用した音響通信により識別情報Iが端末装置50に送信される。 The communication control unit 326 of the response server 30 transmits the acoustic data D2 generated by the signal generation unit 325 to the sound emission system 20 through the communication device 33 (Sa12). The communication control unit 243 of the sound emitting system 20 receives the acoustic data D2 transmitted from the response server 30 by the communication device 25 (Sa13). The sound emission control unit 245 causes the sound emission device 22 to emit sound according to the acoustic data D2 (Sa14). The sound emitting device 22 transmits the identification information I to the terminal device 50 by emitting a mixed sound of the response voice V2 and the acoustic component of the identification information I (Sa15). That is, the identification information I is transmitted to the terminal device 50 by acoustic communication using the sound emitting device 22.
 端末装置50の収音装置51は、放音システム20が音響データD2に応じて放音した音響(すなわち識別情報Iの音響成分を含む音響)を収音する(Sa16)。具体的には、収音した音響の波形を表す音響信号が生成される。情報抽出部521は、収音装置51が生成した音響信号から識別情報Iを抽出する(Sa17)。通信装置54は、情報抽出部521が抽出した識別情報Iを情報提供サーバ40に送信する(Sa18)。 The sound collection device 51 of the terminal device 50 collects the sound emitted by the sound emission system 20 according to the acoustic data D2 (that is, the sound including the acoustic component of the identification information I) (Sa16). Specifically, an acoustic signal that represents the waveform of the collected sound is generated. The information extraction unit 521 extracts the identification information I from the acoustic signal generated by the sound collection device 51 (Sa17). The communication device 54 transmits the identification information I extracted by the information extraction unit 521 to the information providing server 40 (Sa18).
 情報提供サーバ40の通信制御部425は、端末装置50から送信された識別情報Iを、通信装置43により受信する(Sa19)。関連情報特定部423は、通信装置43が受信した識別情報Iに対応する関連情報Rを特定する(Sa20)。通信制御部425は、関連情報特定部423が特定した関連情報Rを、通信装置43により端末装置50に送信する(Sa21)。 The communication control unit 425 of the information providing server 40 receives the identification information I transmitted from the terminal device 50 by the communication device 43 (Sa19). The related information specifying unit 423 specifies related information R corresponding to the identification information I received by the communication device 43 (Sa20). The communication control unit 425 transmits the related information R specified by the related information specifying unit 423 to the terminal device 50 through the communication device 43 (Sa21).
 端末装置50の通信装置54は、情報提供サーバ40から送信された関連情報Rを受信する(Sa22)。再生制御部523は、通信装置54が受信した関連情報Rを再生装置55に再生させる(Sa23)。すなわち、放音装置22により放音された応答音声V2が表す応答に関する関連情報Rが再生装置55により再生される。 The communication device 54 of the terminal device 50 receives the related information R transmitted from the information providing server 40 (Sa22). The reproduction control unit 523 causes the reproduction device 55 to reproduce the related information R received by the communication device 54 (Sa23). That is, the related information R related to the response represented by the response voice V2 emitted by the sound emitting device 22 is reproduced by the reproducing device 55.
 以上の説明から理解される通り、第1実施形態では、応答音声V2を放音する放音装置22を利用した音響通信により識別情報Iが端末装置50に送信されるから、応答音声V2が表す応答に関する関連情報R(例えば応答に関する更に詳細な情報)を、端末装置50が当該識別情報Iを利用して取得できる。したがって、応答音声V2に関する関連情報Rを取得するために利用者Uが端末装置50に煩雑な操作を付与する負荷を軽減できる。また、応答音声V2を放音するための放音装置22を流用して端末装置50に識別情報Iを送信できる。すなわち、識別情報Iの送信に専用される送信機が不要である。 As understood from the above description, in the first embodiment, since the identification information I is transmitted to the terminal device 50 by acoustic communication using the sound emitting device 22 that emits the response sound V2, the response sound V2 represents. The related information R related to the response (for example, more detailed information related to the response) can be acquired by the terminal device 50 using the identification information I. Therefore, it is possible to reduce a load of the user U giving a complicated operation to the terminal device 50 in order to acquire the related information R related to the response voice V2. Further, the identification information I can be transmitted to the terminal device 50 by diverting the sound emitting device 22 for emitting the response voice V2. That is, a transmitter dedicated to the transmission of the identification information I is not necessary.
 第1実施形態では、放音システム20が受付けた発話音声V1が応答サーバ30に送信され、応答サーバ30が生成した応答を表す応答音声V2の音響データD2が受信部253により受信されるから、応答音声V2を生成するための要素を放音システム20に内蔵する必要がない。したがって、放音システム20の構成および動作が簡素化される。また、第1実施形態では、応答生成部322が生成した応答文字列に含まれる応答単語に対応する関連情報Rが生成されるから、応答文字列の全体に対応する関連情報Rを特定する構成と比較して、関連情報Rを簡単に特定できる。したがって、情報提供サーバ40の処理負荷を軽減できる。 In the first embodiment, the uttered voice V1 received by the sound emission system 20 is transmitted to the response server 30, and the acoustic data D2 of the response voice V2 representing the response generated by the response server 30 is received by the receiving unit 253. It is not necessary to incorporate an element for generating the response voice V2 in the sound emission system 20. Therefore, the configuration and operation of the sound emission system 20 are simplified. In the first embodiment, since the related information R corresponding to the response word included in the response character string generated by the response generation unit 322 is generated, the related information R corresponding to the entire response character string is specified. The related information R can be easily identified as compared with. Therefore, the processing load on the information providing server 40 can be reduced.
<第2実施形態>
 本開示の第2実施形態を説明する。なお、以下の各例示において機能が第1実施形態と同様である要素については、第1実施形態の説明で使用した符号を流用して各々の詳細な説明を適宜に省略する。
Second Embodiment
A second embodiment of the present disclosure will be described. In the following examples, elements having the same functions as those of the first embodiment are diverted using the same reference numerals used in the description of the first embodiment, and detailed descriptions thereof are appropriately omitted.
 第1実施形態では、関連情報Rの識別情報Iを応答サーバ30により生成する。それに対して、第2実施形態では、関連情報Rの識別情報Iを放音システム20により生成する。すなわち、第2実施形態の応答サーバ30において、識別情報生成部324は省略される。 In the first embodiment, the identification information I of the related information R is generated by the response server 30. On the other hand, in the second embodiment, the identification information I of the related information R is generated by the sound emission system 20. That is, the identification information generation unit 324 is omitted in the response server 30 of the second embodiment.
 放音システム20の制御装置24は、通信制御部243および放音制御部245に加えて、識別情報生成部としても機能する。利用者Uが発音した発話音声V1を収音装置21が受付けると(すなわち入力データD1を生成すると)、識別情報生成部は、当該入力データD1に対応する識別情報Iを生成する。当該入力データD1に応じて応答サーバ30が生成する関連情報Rに対応する識別情報Iが、識別情報生成部により予め生成される。通信制御部243は、放音装置22が生成した入力データD1と、識別情報生成部が生成した識別情報Iとを、通信装置25により応答サーバ30に送信する。 The control device 24 of the sound emission system 20 functions as an identification information generation unit in addition to the communication control unit 243 and the sound emission control unit 245. When the sound collection device 21 receives the utterance voice V1 produced by the user U (that is, when the input data D1 is generated), the identification information generation unit generates the identification information I corresponding to the input data D1. Identification information I corresponding to the related information R generated by the response server 30 in accordance with the input data D1 is generated in advance by the identification information generation unit. The communication control unit 243 transmits the input data D1 generated by the sound emitting device 22 and the identification information I generated by the identification information generation unit to the response server 30 through the communication device 25.
 応答サーバ30の通信制御部326は、放音システム20から送信された入力データD1および識別情報Iを、通信装置33により受信する。入力データD1を受信した応答サーバ30の音声認識部321は、第1実施形態と同様に、入力データD1から発話文字列を特定する。応答生成部322は、第1実施形態と同様に、発話文字列に対する応答文字列を生成する。関連情報生成部323は、第1実施形態と同様に、応答文字列が表す応答に関する関連情報Rを生成する。信号生成部325は、応答音声V2と、放音システム20から送信された識別情報Iの音響成分とを表す音響データD2を生成する。信号生成部325により生成された音響データD2は、第1実施形態と同様に、通信制御部326による制御のもとで放音システム20に対して送信される。関連情報生成部323が生成した関連情報Rと、放音システム20から送信された識別情報Iとを含む提供データD3は、通信制御部326による制御のもとで情報提供サーバ40に対して送信される。 The communication control unit 326 of the response server 30 receives the input data D1 and the identification information I transmitted from the sound emitting system 20 by the communication device 33. The voice recognition unit 321 of the response server 30 that has received the input data D1 identifies an utterance character string from the input data D1 as in the first embodiment. The response generation unit 322 generates a response character string for the utterance character string, as in the first embodiment. The related information generation unit 323 generates related information R related to the response represented by the response character string, as in the first embodiment. The signal generator 325 generates acoustic data D2 representing the response voice V2 and the acoustic component of the identification information I transmitted from the sound emission system 20. The acoustic data D2 generated by the signal generation unit 325 is transmitted to the sound emission system 20 under the control of the communication control unit 326, as in the first embodiment. The provided data D3 including the related information R generated by the related information generating unit 323 and the identification information I transmitted from the sound emission system 20 is transmitted to the information providing server 40 under the control of the communication control unit 326. Is done.
 提供データD3を受信した情報提供サーバ40は、第1実施形態と同様に、提供データD3を記憶装置41に記憶する。すなわち、放音システム20により生成された識別情報Iが、応答サーバ30により生成された関連情報Rに対応した状態で記憶装置41に登録される。音響データD2を受信した放音システム20は、第1実施形態と同様に、応答音声V2と、当該応答音声V2対応する関連情報Rの識別情報Iを表す音響成分とを音響データD2に応じて放音する。端末装置50は、第1実施形態と同様に、情報提供サーバ40から関連情報Rを取得する。 The information providing server 40 that has received the provided data D3 stores the provided data D3 in the storage device 41 as in the first embodiment. That is, the identification information I generated by the sound emission system 20 is registered in the storage device 41 in a state corresponding to the related information R generated by the response server 30. The sound emission system 20 that has received the acoustic data D2 sends the response voice V2 and the acoustic component representing the identification information I of the related information R corresponding to the response voice V2 in accordance with the acoustic data D2, as in the first embodiment. Sounds out. The terminal device 50 acquires the related information R from the information providing server 40 as in the first embodiment.
 第2実施形態においても第1実施形態と同様の効果が実現される。第2実施形態では、放音システム20から応答サーバ30に識別情報Iが送信されるから、応答サーバ30で識別情報Iを生成することなく、応答音声V2と識別情報Iとの対応を応答サーバ30において管理することができる。したがって、応答サーバ30の処理負荷が軽減できる。 In the second embodiment, the same effect as in the first embodiment is realized. In the second embodiment, since the identification information I is transmitted from the sound emitting system 20 to the response server 30, the response server 30 determines the correspondence between the response voice V 2 and the identification information I without generating the identification information I in the response server 30. 30 can be managed. Therefore, the processing load on the response server 30 can be reduced.
<変形例>
 以上に例示した各態様に付加される具体的な変形の態様を以下に例示する。以下の例示から任意に選択された複数の態様を、相互に矛盾しない範囲で適宜に併合してもよい。
<Modification>
Specific modifications added to each of the above-exemplified aspects will be exemplified below. A plurality of modes arbitrarily selected from the following examples may be appropriately combined as long as they do not contradict each other.
(1)前述の各形態では、発話音声V1を利用者Uによる入力として例示したが、利用者Uによる入力は発話音声V1に限定されない。例えば利用者Uにより指定された文字列を利用者Uによる入力としてもよい。例えば、利用者Uからの指示を受付ける操作装置(図示略)を放音システム20が具備する構成が想定される。操作装置は、例えば利用者Uが操作する複数の操作子(例えば日本語における仮名文字、アルファベットまたは数字にそれぞれ対応した複数の操作子)を含んで構成される。利用者Uは、例えば問掛け(質問)および話掛けを含む文字列(以下「入力文字列」という)を操作装置に対して指示する。なお、入力文字列は、利用者Uが検索を所望する事柄に関する1以上のキーワード(例えば単語)であってもよい。操作装置は、入力文字列を受付ける。具体的には、入力文字列を表す入力データD1が生成される。すなわち、操作装置は、利用者Uが操作装置に対して指示した入力文字列を受付ける受付部として機能する。入力データD1を受信した応答サーバ30は、当該入力データD1に応じて応答文字列および関連情報Rを生成する。すなわち、音声認識部321は省略される。 (1) In each of the above-described embodiments, the speech voice V1 is exemplified as the input by the user U. However, the input by the user U is not limited to the speech voice V1. For example, a character string designated by the user U may be input by the user U. For example, a configuration in which the sound emission system 20 includes an operation device (not shown) that receives an instruction from the user U is assumed. The operating device includes, for example, a plurality of operators operated by the user U (for example, a plurality of operators corresponding to kana characters, alphabets, or numbers in Japanese). The user U instructs, for example, a character string including an inquiry (question) and a conversation (hereinafter referred to as “input character string”) to the operating device. Note that the input character string may be one or more keywords (for example, words) related to a matter that the user U desires to search. The controller device accepts an input character string. Specifically, input data D1 representing the input character string is generated. That is, the controller device functions as a reception unit that receives an input character string that the user U has instructed the controller device. Upon receiving the input data D1, the response server 30 generates a response character string and related information R according to the input data D1. That is, the voice recognition unit 321 is omitted.
 また、例えば事前に準備された質問や話掛けをそれぞれ表す複数の選択肢のうち所望の選択肢を、利用者Uが操作装置を利用して選択してもよい。利用者Uが選択した選択肢に設定された質問や話掛けを示す入力データD1が生成される。すなわち、操作装置は、利用者Uによる選択肢の選択を受付ける受付部として機能する。選択肢の選択が利用者Uの入力に相当する。以上の説明から理解される通り、利用者Uからの入力は、利用者Uの意図に応じて受付部に付与される情報であり、発話音声V1、入力文字列、選択肢等が例示される。また、利用者Uによる入力の種類に応じて、利用者Uからの入力を受付ける受付部として利用される機器も適宜に変更される。 Further, for example, the user U may select a desired option from among a plurality of options each representing a question and a conversation prepared in advance using the operation device. Input data D1 indicating the question or conversation set in the option selected by the user U is generated. That is, the controller device functions as a reception unit that receives selection of options by the user U. The choice selection corresponds to the input of the user U. As understood from the above description, the input from the user U is information given to the accepting unit according to the intention of the user U, and examples thereof include an utterance voice V1, an input character string, and options. In addition, depending on the type of input by the user U, a device used as a reception unit that receives input from the user U is also changed as appropriate.
(2)前述の各形態では、応答文字列の応答単語に対応する関連情報Rが生成されたが、関連情報Rは、利用者Uからの入力に対する応答に関する情報であれば、その内容は任意である。例えば、応答文字列の全体の内容を考慮して関連情報Rを生成してもよい。関連情報生成部323は、例えば「レストランABCの場所はどこ?」という発話文字列に対して、レストランABCの所在を示す関連情報Rを生成する。また、応答文字列そのものや、当該応答文字列を他言語に翻訳した文字列を関連情報Rとしてもよい。なお、関連情報Rの生成に関連情報テーブルを利用することは必須ではない。関連情報Rの生成とは、関連情報テーブルに登録された複数の関連情報Rのうち何れかを特定すること、および、利用者Uからの入力に応じて新たに生成することの双方を含む。関連情報Rの内容および種類に応じて、関連情報Rを生成する方法は適宜に変更される。 (2) In each of the above-described forms, the related information R corresponding to the response word of the response character string is generated. However, if the related information R is information related to the response to the input from the user U, the content is arbitrary. It is. For example, the related information R may be generated in consideration of the entire content of the response character string. The related information generation unit 323 generates the related information R indicating the location of the restaurant ABC, for example, for an utterance character string “Where is the restaurant ABC?”. The response character string itself or a character string obtained by translating the response character string into another language may be used as the related information R. Note that it is not essential to use the related information table for generating the related information R. The generation of the related information R includes both specifying any one of the plurality of related information R registered in the related information table and newly generating the related information R in response to an input from the user U. The method for generating the related information R is appropriately changed according to the content and type of the related information R.
(3)前述の各形態では、発話音声V1に対する応答として応答文字列が応答生成部322により生成されたが、応答生成部322が生成する応答は応答文字列に限定されない。例えば応答生成部322が生成する応答の内容が固定である場合には、例えば記憶装置23が事前に応答音声V2を記憶しておくことも可能である。応答生成部322は、入力データD1に応じた応答音声V2を発話音声V1に対する応答として記憶装置23から特定する。 (3) In each of the above-described forms, the response character string is generated by the response generation unit 322 as a response to the speech voice V1, but the response generated by the response generation unit 322 is not limited to the response character string. For example, when the content of the response generated by the response generation unit 322 is fixed, for example, the storage device 23 can store the response voice V2 in advance. The response generation unit 322 specifies the response voice V2 corresponding to the input data D1 from the storage device 23 as a response to the utterance voice V1.
 また、応答生成部322は、音声認識部321が生成した発話文字列を他言語に翻訳した文字列を、発話音声V1に対する応答として生成してもよい。発話音声V1を他言語に翻訳した応答音声V2が放音システム20から放音される。以上の構成によれば、利用者Uの発話音声V1を他言語に翻訳する自動翻訳機が放音システム20として利用される。自動翻訳機を放音システム20とする構成では、発話文字列を他言語に翻訳した文字列が関連情報Rとして利用される。なお、応答サーバ30の機能を自動翻訳機に搭載してもよい。 Further, the response generation unit 322 may generate a character string obtained by translating the utterance character string generated by the voice recognition unit 321 into another language as a response to the utterance voice V1. A response voice V2 obtained by translating the speech voice V1 into another language is emitted from the sound emission system 20. According to the above configuration, an automatic translator that translates the speech U1 of the user U into another language is used as the sound emission system 20. In the configuration in which the automatic translator is the sound emission system 20, a character string obtained by translating an utterance character string into another language is used as the related information R. Note that the function of the response server 30 may be installed in an automatic translator.
(4)前述の各形態では、放音システム20は、応答音声V2の放音により、発話音声V1に対する応答を利用者Uに提示したが、応答音声V2の放音とともに、例えば放音システム20の表示装置(例えば液晶ディスプレイ)により応答文字列や関連情報Rを表示してもよい。また、応答音声V2の放音を省略する構成も採用される。以上の構成では、応答音声V2と識別情報Iの音響成分とを表す音響データD2に代えて、識別情報Iの音響成分のみを表す音響データD2が放音システム20に送信され、当該放音システム20により識別情報Iが端末装置50に送信される。同様に、応答文字列の表示も省略してもよい。    (4) In each of the above-described embodiments, the sound emission system 20 presents a response to the utterance voice V1 to the user U by emitting the response voice V2, but together with the sound emission of the response voice V2, for example, the sound emission system 20 The response character string and the related information R may be displayed on the display device (for example, a liquid crystal display). Further, a configuration that omits sound emission of the response voice V2 is also employed. In the above configuration, instead of the acoustic data D2 representing the response voice V2 and the acoustic component of the identification information I, the acoustic data D2 representing only the acoustic component of the identification information I is transmitted to the sound emitting system 20, and the sound emitting system 20, identification information I is transmitted to the terminal device 50. Similarly, the display of the response character string may be omitted. *
(5)前述の各形態では、応答音声V2と識別情報Iの音響成分との混合音を表す音響データD2が応答サーバ30により生成されたが、応答サーバ30は、応答音声V2と識別情報Iの音響成分とを個別の音響として含む音響データD2を生成して、当該音響データD2を放音システム20に送信してもよい。放音システム20は、音響データD2に応じた音響を放音する。応答音声V2と識別情報Iの音響成分との混合音を放音してもよいし、応答音声V2と識別情報Iの音響成分とを個別に放音してもよい。また、応答音声V2と識別情報Iの音響成分とが放音される時期の関係は、任意である。例えば応答音声V2と識別情報Iの音響成分とが並行に放音されてもよいし、応答音声V2と識別情報Iの音響成分とが時間軸上の別の期間に放音されてもよい。放音制御部245は、受付部が受付けた入力に対する応答を表す応答音声V2と、当該応答に関する関連情報Rの識別情報Iを表す音響成分とを放音装置22に放音させる要素として包括的に表現される。 (5) In each of the above-described embodiments, the acoustic data D2 representing the mixed sound of the response voice V2 and the acoustic component of the identification information I is generated by the response server 30, but the response server 30 uses the response voice V2 and the identification information I. The sound data D2 including the sound components as individual sounds may be generated, and the sound data D2 may be transmitted to the sound emitting system 20. The sound emitting system 20 emits sound according to the sound data D2. A mixed sound of the response voice V2 and the acoustic component of the identification information I may be emitted, or the response voice V2 and the acoustic component of the identification information I may be emitted individually. Further, the relationship between the response voice V2 and the sound component of the identification information I is arbitrary. For example, the response voice V2 and the acoustic component of the identification information I may be emitted in parallel, or the response voice V2 and the acoustic component of the identification information I may be emitted in different periods on the time axis. The sound emission control unit 245 is comprehensive as an element that causes the sound emission device 22 to emit a response voice V2 representing a response to the input received by the reception unit and an acoustic component representing the identification information I of the related information R related to the response. It is expressed in
(6)前述の各形態では、応答サーバ30が音響データD2を生成したが、放音システム20が音響データD2を生成してもよい。応答サーバ30は、応答文字列および識別情報Iを放音システム20に生成する。放音システム20は、応答サーバ30から送信された応答文字列と識別情報Iとから音響データD2を生成し、当該音響データD2に応じて放音する。すなわち、信号生成部325は、応答サーバ30から省略され得る。 (6) In each form mentioned above, although the response server 30 produced | generated acoustic data D2, the sound emission system 20 may produce | generate acoustic data D2. The response server 30 generates a response character string and identification information I in the sound emission system 20. The sound emitting system 20 generates acoustic data D2 from the response character string transmitted from the response server 30 and the identification information I, and emits sound according to the acoustic data D2. That is, the signal generation unit 325 can be omitted from the response server 30.
(7)前述の各形態では、関連情報Rの生成毎に識別情報生成部324が識別情報Iを生成したが、関連情報テーブルに登録される関連情報Rについて、事前に識別情報Iを登録しておいてもよい。識別情報生成部324は、関連情報生成部323により関連情報Rが生成されると、当該関連情報Rに対応する識別情報Iを関連情報テーブルから特定する。なお、以上の構成においては、複数の関連情報Rの各々について当該関連情報Rの識別情報Iを対応させて事前に情報提供テーブルに登録しておいてもよい。以上の構成では、応答サーバ30から情報提供サーバ40に対する提供データD3の送信が省略される。 (7) In each of the above forms, the identification information generation unit 324 generates the identification information I every time the related information R is generated. However, the identification information I is registered in advance for the related information R registered in the related information table. You may keep it. When the related information R is generated by the related information generating unit 323, the identification information generating unit 324 specifies the identification information I corresponding to the related information R from the related information table. In the above configuration, each of the plurality of related information R may be registered in advance in the information provision table in association with the identification information I of the related information R. In the above configuration, transmission of the provision data D3 from the response server 30 to the information provision server 40 is omitted.
(8)前述の各形態では、放音システム20は発話音声V1を表す音響信号を入力データD1として応答サーバ30に送信したが、発話音声V1の発話文字列を入力データD1として応答サーバ30に送信してもよい。すなわち、音声認識部321は、応答サーバ30から省略され得る。 (8) In each of the above-described embodiments, the sound emission system 20 transmits the acoustic signal representing the utterance voice V1 as the input data D1 to the response server 30, but the utterance character string of the utterance voice V1 is input to the response server 30 as the input data D1. You may send it. That is, the voice recognition unit 321 can be omitted from the response server 30.
(9)前述の各形態では、応答サーバ30と情報提供サーバ40と放音システム20とで情報提供システム100を構成したが、情報提供システム100の構成は以上の例示に限定されない。例えば、単体の装置で情報提供システム100を構成してもよい。また、応答サーバ30と放音システム20とを単体の装置で実現してもよいし、応答サーバ30と情報提供システム100とを単体の装置で実現してもよい。 (9) In each embodiment described above, the information providing system 100 is configured by the response server 30, the information providing server 40, and the sound emitting system 20, but the configuration of the information providing system 100 is not limited to the above examples. For example, the information providing system 100 may be configured with a single device. Further, the response server 30 and the sound emission system 20 may be realized by a single device, or the response server 30 and the information providing system 100 may be realized by a single device.
(10)前述の各形態では、音声対話装置を放音システム20として利用したが、例えば自動券売機や自動販売機等を放音システム20として利用してもよい。以上の構成によれば、例えば利用者Uによる購入品に関する情報を関連情報Rとして利用できる。 (10) In each of the above-described embodiments, the voice interactive device is used as the sound emission system 20. According to the above configuration, for example, information related to a purchased item by the user U can be used as the related information R.
(11)前述の各形態では、放音装置22による音響通信で識別情報Iを端末装置50に送信したが、例えば放音システム20は、Bluetooth(登録商標)またはWi-Fi(登録商標)等の近距離無線通信により識別情報Iを端末装置50に送信してもよい。すなわち、応答音声V2を放音する放音装置22とは異なる通信機器により識別情報Iが端末装置50に送信される。 (11) In each of the above-described embodiments, the identification information I is transmitted to the terminal device 50 by acoustic communication using the sound emitting device 22. For example, the sound emitting system 20 may be Bluetooth (registered trademark) or Wi-Fi (registered trademark). The identification information I may be transmitted to the terminal device 50 by short-range wireless communication. That is, the identification information I is transmitted to the terminal device 50 by a communication device different from the sound emitting device 22 that emits the response voice V2.
(12)前述の各形態に係る放音システム20、情報処理システム(応答サーバ30および情報提供サーバ40)および端末装置50の機能は、各形態での例示の通り、制御装置とプログラムとの協働により実現される。前述の各形態に係るプログラムは、コンピュータが読取可能な記録媒体に格納された形態で提供されてコンピュータにインストールされ得る。記録媒体は、例えば非一過性(non-transitory)の記録媒体であり、CD-ROM等の光学式記録媒体(光ディスク)が好例であるが、半導体記録媒体または磁気記録媒体等の公知の任意の形式の記録媒体も包含される。なお、非一過性の記録媒体とは、一過性の伝搬信号(transitory, propagating signal)を除く任意の記録媒体を含み、揮発性の記録媒体も除外されない。また、通信網を介した配信の形態でプログラムをコンピュータに提供してもよい。 (12) The functions of the sound emission system 20, the information processing system (the response server 30 and the information providing server 40), and the terminal device 50 according to each of the above-described forms are the same as the cooperation between the control device and the program as illustrated in each form. Realized by work. The programs according to the above-described embodiments can be provided in a form stored in a computer-readable recording medium and installed in the computer. The recording medium is, for example, a non-transitory recording medium, and an optical recording medium (optical disk) such as a CD-ROM is a good example, but a known arbitrary one such as a semiconductor recording medium or a magnetic recording medium This type of recording medium is also included. The non-transitory recording medium includes an arbitrary recording medium excluding a transient propagation signal (transitory, “propagating” signal), and does not exclude a volatile recording medium. In addition, the program may be provided to the computer in the form of distribution via a communication network.
<付記>
 以上に例示した形態から、例えば以下の構成が把握される。
<Appendix>
For example, the following configuration is grasped from the above-exemplified form.
 本開示の一例(第1態様)に係る情報提供方法は、利用者からの入力を受付け、前記受付けた入力に対する応答に関する関連情報の識別情報を表す音響成分を放音装置に放音させる。以上の態様では、放音装置を利用した音響通信により識別情報が端末装置に送信されるから、応答音声が表す応答に関する関連情報(例えば応答に関する更に詳細な情報)を、端末装置が当該識別情報を利用して取得できる。したがって、応答音声に関する関連情報を取得するために利用者が端末装置に煩雑な操作を付与する負荷を軽減できる。 The information providing method according to an example (first aspect) of the present disclosure receives an input from a user, and causes the sound emitting device to emit an acoustic component representing identification information of related information related to a response to the received input. In the above aspect, since the identification information is transmitted to the terminal device by acoustic communication using the sound emitting device, the terminal device displays the related information (for example, more detailed information about the response) related to the response represented by the response voice. Can be obtained using. Therefore, it is possible to reduce a load of the user giving a complicated operation to the terminal device in order to acquire related information related to the response voice.
 第1態様の一例(第2態様)において、前記放音装置に放音させる動作では、前記応答を表す応答音声と前記音響成分とを前記放音装置に放音させる。以上の態様では、応答音声を放音する放音装置を、当該応答音声に対応する関連情報の識別情報を送信する通信装置として利用することができる。 In an example of the first aspect (second aspect), in the operation of causing the sound emitting device to emit sound, the sound emitting device emits a response voice representing the response and the acoustic component. In the above aspect, the sound emitting device that emits the response sound can be used as a communication device that transmits the identification information of the related information corresponding to the response sound.
 第2態様の一例(第3態様)では、前記受付けた入力を表す入力データを応答サーバに送信し、前記入力データが表す入力に対する応答を表す前記応答音声と、当該応答に関する関連情報の識別情報を表す前記音響成分とを表す音響データを前記応答サーバから受信し、前記受信した音響データに応じて前記放音装置に放音させる。以上の態様では、受付けた入力が応答サーバに送信され、応答サーバが生成した応答を表す応答音声の音響データが受信されるから、応答音声を生成するための要素を放音システムに内蔵する必要がない。したがって、情報提供方法の構成および動作が簡素化される。 In an example of the second mode (third mode), input data representing the accepted input is transmitted to a response server, the response voice representing a response to the input represented by the input data, and identification information of related information regarding the response The sound data representing the sound component representing is received from the response server, and the sound emitting device is caused to emit sound according to the received sound data. In the above aspect, since the received input is transmitted to the response server and the acoustic data of the response voice representing the response generated by the response server is received, it is necessary to incorporate an element for generating the response voice in the sound emission system. There is no. Therefore, the configuration and operation of the information providing method are simplified.
 第3態様の一例(第4態様)では、前記識別情報を生成し、前記入力データと、前記生成した識別情報とを前記応答サーバに送信する。以上の態様では、応答サーバで識別情報を生成することなく、応答音声と識別情報との対応を応答サーバにおいて管理することができる。 In an example of the third aspect (fourth aspect), the identification information is generated, and the input data and the generated identification information are transmitted to the response server. In the above aspect, the response server can manage the correspondence between the response voice and the identification information without generating the identification information in the response server.
 第1態様から第4態様の何れかの一例(第5態様)では、前記利用者からの入力は、当該利用者が発話した音声である。以上の態様では、利用者が発話することで関連情報を取得できるから、例えば操作子を利用して入力をしなくても利用者が容易に関連情報を取得することができる。 In one example of the first aspect to the fourth aspect (fifth aspect), the input from the user is a voice uttered by the user. In the above aspect, since the related information can be acquired by the user speaking, the related information can be easily acquired by the user without input using the operation element, for example.
 本開示の一例(第6態様)に係る情報処理方法は、利用者による入力に対する応答を生成し、前記生成した応答に関する関連情報を生成し、前記関連情報の識別情報を表す音響成分を表す音響データを、当該音響データに応じて放音する放音システムに対して送信するし、前記放音システムによる音響通信で前記識別情報を受信した端末装置からの情報要求に応じて、当該識別情報に対応する関連情報を当該端末装置に送信する。以上の態様では、放音装置を利用した音響通信により識別情報が端末装置に送信されるから、応答音声が表す応答に関する関連情報(例えば応答に関する更に詳細な情報)を、端末装置が当該識別情報を利用して取得できる。したがって、応答音声に関する関連情報を取得するために利用者が端末装置に煩雑な操作を付与する負荷を軽減できる。 An information processing method according to an example (sixth aspect) of the present disclosure generates a response to an input by a user, generates related information regarding the generated response, and represents an acoustic component representing acoustic information representing identification information of the related information In response to an information request from a terminal device that transmits data to a sound emission system that emits sound according to the sound data and receives the identification information through acoustic communication by the sound emission system, Corresponding related information is transmitted to the terminal device. In the above aspect, since the identification information is transmitted to the terminal device by acoustic communication using the sound emitting device, the terminal device displays the related information (for example, more detailed information about the response) related to the response represented by the response voice. Can be obtained using. Therefore, it is possible to reduce a load of the user giving a complicated operation to the terminal device in order to acquire related information related to the response voice.
 第6態様の一例(第7態様)では、前記関連情報の生成において、前記応答に含まれる単語に対応する関連情報を生成する。以上の態様では、応答の全体に対応する関連情報を特定する構成と比較して、関連情報を簡単に特定できる。 In an example of the sixth aspect (seventh aspect), in the generation of the related information, related information corresponding to a word included in the response is generated. In the above aspect, the related information can be easily specified as compared with the configuration in which the related information corresponding to the entire response is specified.
 本開示の一例(第8態様)に係る放音システムは、利用者からの入力を受付ける受付部と、音響を放音する放音装置と、前記受付部が受付けた入力に対する応答に関する関連情報の識別情報を表す音響成分を前記放音装置に放音させる放音制御部とを具備する。以上の態様では、放音装置を利用した音響通信により識別情報が端末装置に送信されるから、応答音声が表す応答に関する関連情報(例えば応答に関する更に詳細な情報)を、端末装置が当該識別情報を利用して取得できる。したがって、応答音声に関する関連情報を取得するために利用者が端末装置に煩雑な操作を付与する負荷を軽減できる。 A sound emission system according to an example of the present disclosure (eighth aspect) includes a reception unit that receives input from a user, a sound emission device that emits sound, and related information related to a response to the input received by the reception unit. A sound emission control unit that causes the sound emission device to emit an acoustic component representing the identification information. In the above aspect, since the identification information is transmitted to the terminal device by acoustic communication using the sound emitting device, the terminal device displays the related information (for example, more detailed information about the response) related to the response represented by the response voice. Can be obtained using. Therefore, it is possible to reduce a load of the user giving a complicated operation to the terminal device in order to acquire related information related to the response voice.
 第8態様の一例(第9態様)では、前記放音制御部は、前記応答を表す応答音声と前記音響成分とを前記放音装置に放音させる。以上の態様では、応答音声を放音する放音装置を、当該応答音声に対応する関連情報の識別情報を送信する通信装置として利用することができる。 In an example of the eighth aspect (the ninth aspect), the sound emission control unit causes the sound emitting device to emit a response voice representing the response and the acoustic component. In the above aspect, the sound emitting device that emits the response sound can be used as a communication device that transmits the identification information of the related information corresponding to the response sound.
 第9態様の一例(第10態様)では、前記受付部が受付けた入力を表す入力データを応答サーバに送信する送信部と、前記入力データが表す入力に対する応答を表す前記応答音声と、当該応答に関する関連情報の識別情報を表す前記音響成分とを表す音響データを前記応答サーバから受信する受信部とを具備し、前記放音制御部は、前記受信部が受信した音響データに応じて前記放音装置に放音させる。以上の態様では、受付部が受付けた入力が応答サーバに送信され、応答サーバが生成した応答を表す応答音声の音響データが受信部により受信されるから、応答音声を生成するための要素を放音システムに内蔵する必要がない。したがって、放音システムの構成および動作が簡素化される。 In an example of the ninth aspect (tenth aspect), a transmission unit that transmits input data representing an input accepted by the reception unit to a response server, the response voice that represents a response to the input represented by the input data, and the response A receiving unit that receives from the response server acoustic data that represents the acoustic component representing the identification information of the related information related to the sound information, and the sound emission control unit is configured to release the sound according to the acoustic data received by the receiving unit. Let the sound device emit sound. In the above aspect, since the input accepted by the accepting unit is transmitted to the response server, and the acoustic data of the response voice representing the response generated by the response server is received by the receiving unit, the element for generating the response voice is released. Does not need to be built into the sound system. Therefore, the configuration and operation of the sound emission system are simplified.
 第10態様の一例(第11態様)では、前記識別情報を生成する識別情報生成部を具備し、前記送信部は、前記入力データと、前記識別情報生成部が生成した識別情報とを前記応答サーバに送信する。以上の態様では、応答サーバで識別情報を生成することなく、応答音声と識別情報との対応を応答サーバにおいて管理することができる。 In an example of the tenth aspect (an eleventh aspect), an identification information generation unit that generates the identification information is provided, and the transmission unit receives the input data and the identification information generated by the identification information generation unit as the response. Send to server. In the above aspect, the response server can manage the correspondence between the response voice and the identification information without generating the identification information in the response server.
 第8態様から第11態様の何れかの一例(第12態様)では、前記利用者からの入力は、当該利用者が発話した音声である。以上の態様では、利用者が発話することで関連情報を取得できるから、例えば操作子を利用して入力をしなくても利用者が容易に関連情報を取得することができる。 In one example of the eighth aspect to the eleventh aspect (the twelfth aspect), the input from the user is a voice uttered by the user. In the above aspect, since the related information can be acquired by the user speaking, the related information can be easily acquired by the user without input using the operation element, for example.
 本開示の一例(第13態様)に係る情報処理システムは、利用者による入力に対する応答を生成する応答生成部と、前記応答生成部が生成した応答に関する関連情報を生成する関連情報生成部と、前記関連情報生成部が生成した関連情報の識別情報を表す音響成分を表す音響データを、当該音響データに応じて放音する放音システムに対して送信する第1通信制御部と、前記放音システムによる音響通信で前記識別情報を受信した端末装置からの情報要求に応じて、当該識別情報に対応する関連情報を当該端末装置に送信する第2通信制御部とを具備する。以上の態様では、放音装置を利用した音響通信により識別情報が端末装置に送信されるから、応答音声が表す応答に関する関連情報(例えば応答に関する更に詳細な情報)を、端末装置が当該識別情報を利用して取得できる。したがって、応答音声に関する関連情報を取得するために利用者が端末装置に煩雑な操作を付与する負荷を軽減できる。 An information processing system according to an example of the present disclosure (a thirteenth aspect) includes a response generation unit that generates a response to an input by a user, a related information generation unit that generates related information about the response generated by the response generation unit, A first communication control unit configured to transmit acoustic data representing an acoustic component representing identification information of the related information generated by the related information generating unit to a sound emitting system that emits sound according to the sound data; A second communication control unit that transmits related information corresponding to the identification information to the terminal device in response to an information request from the terminal device that has received the identification information by acoustic communication by the system. In the above aspect, since the identification information is transmitted to the terminal device by acoustic communication using the sound emitting device, the terminal device displays the related information (for example, more detailed information about the response) related to the response represented by the response voice. Can be obtained using. Therefore, it is possible to reduce a load of the user giving a complicated operation to the terminal device in order to acquire related information related to the response voice.
 第13態様の一例(第14態様)では、前記関連情報生成部は、前記応答生成部が生成した応答に含まれる単語に対応する関連情報を生成する。以上の態様では、応答の全体に対応する関連情報を特定する構成と比較して、関連情報を簡単に特定できる。 In one example of the thirteenth aspect (fourteenth aspect), the related information generation unit generates related information corresponding to a word included in the response generated by the response generation unit. In the above aspect, the related information can be easily specified as compared with the configuration in which the related information corresponding to the entire response is specified.
100…情報提供システム、20…放音システム、21…収音装置、22…放音装置、23…記憶装置、24…制御装置、243…通信制御部、245…放音制御部、25…通信装置、251…送信部、253…受信部、30…応答サーバ、31…記憶装置、32…制御装置、321…音声認識部、322…応答生成部、323…関連情報生成部、324…識別情報生成部、325…信号生成部、326…通信制御部、33…通信装置、331…送信部、333…受信部、40…情報提供サーバ、41…記憶装置、42…制御装置、421…記憶制御部、423…関連情報特定部、425…通信制御部、43…通信装置、431…送信部、433…受信部、50…端末装置、51…収音装置、52…制御装置、521…情報抽出部、523…再生制御部、53…記憶装置、54…通信装置、55…再生装置、71…音声合成部、73…変調処理部、74…加算部。 DESCRIPTION OF SYMBOLS 100 ... Information provision system, 20 ... Sound emission system, 21 ... Sound collection device, 22 ... Sound emission device, 23 ... Memory | storage device, 24 ... Control apparatus, 243 ... Communication control part, 245 ... Sound emission control part, 25 ... Communication Device: 251 ... Transmission unit, 253 ... Reception unit, 30 ... Response server, 31 ... Storage device, 32 ... Control device, 321 ... Voice recognition unit, 322 ... Response generation unit, 323 ... Related information generation unit, 324 ... Identification information Generation unit, 325 ... signal generation unit, 326 ... communication control unit, 33 ... communication device, 331 ... transmission unit, 333 ... reception unit, 40 ... information providing server, 41 ... storage device, 42 ... control device, 421 ... storage control , 423 ... related information specifying unit, 425 ... communication control unit, 43 ... communication device, 431 ... transmission unit, 433 ... reception unit, 50 ... terminal device, 51 ... sound collecting device, 52 ... control device, 521 ... information extraction Part, 523 ... Raw controller, 53 ... storage device, 54 ... communication device, 55 ... playback apparatus, 71 ... sound synthesis unit, 73 ... modulation processing unit, 74 ... adding unit.

Claims (14)

  1.  利用者からの入力を受付け、
     前記受付けた入力に対する応答に関する関連情報の識別情報を表す音響成分を放音装置に放音させる
     コンピュータにより実現される情報提供方法。
    Accepts input from users,
    An information providing method realized by a computer that causes a sound emitting device to emit an acoustic component representing identification information of related information related to a response to the accepted input.
  2.  前記放音装置に放音させる動作では、前記応答を表す応答音声と前記音響成分とを前記放音装置に放音させる
     請求項1の情報提供方法。
    The information providing method according to claim 1, wherein in the operation of causing the sound emitting device to emit sound, the sound emitting device emits a response voice representing the response and the acoustic component.
  3.  前記受付けた入力を表す入力データを応答サーバに送信し、
     前記入力データが表す入力に対する応答を表す前記応答音声と、当該応答に関する関連情報の識別情報を表す前記音響成分とを表す音響データを前記応答サーバから受信し、
     前記受信した音響データに応じて前記放音装置に放音させる
     請求項2の情報提供方法。
    Sending input data representing the accepted input to a response server;
    Receiving from the response server acoustic data representing the response voice representing a response to the input represented by the input data and the acoustic component representing identification information of related information related to the response;
    The information providing method according to claim 2, wherein the sound emitting device emits sound according to the received acoustic data.
  4.  前記識別情報を生成し、
     前記入力データと、前記生成した識別情報とを前記応答サーバに送信する
     請求項3の情報提供方法。
    Generating the identification information;
    The information providing method according to claim 3, wherein the input data and the generated identification information are transmitted to the response server.
  5.  前記利用者からの入力は、当該利用者が発話した音声である
     請求項1から請求項4の何れかの情報提供方法。
    The information providing method according to claim 1, wherein the input from the user is a voice uttered by the user.
  6.  利用者による入力に対する応答を生成し、
     前記生成した応答に関する関連情報を生成し、
     前記関連情報の識別情報を表す音響成分を表す音響データを、当該音響データに応じて放音する放音システムに対して送信し、
     前記放音システムによる音響通信で前記識別情報を受信した端末装置からの情報要求に応じて、当該識別情報に対応する関連情報を当該端末装置に送信する
     情報処理方法。
    Generate a response to user input,
    Generating relevant information about the generated response;
    Transmitting acoustic data representing an acoustic component representing identification information of the related information to a sound emitting system that emits sound according to the acoustic data;
    An information processing method for transmitting related information corresponding to the identification information to the terminal device in response to an information request from the terminal device that has received the identification information by acoustic communication using the sound emission system.
  7.  前記関連情報の生成では、前記応答に含まれる単語に対応する関連情報を生成する
     請求項6の情報処理方法。
    The information processing method according to claim 6, wherein in the generation of the related information, related information corresponding to a word included in the response is generated.
  8.  利用者からの入力を受付ける受付部と、
     音響を放音する放音装置と、
     前記受付部が受付けた入力に対する応答に関する関連情報の識別情報を表す音響成分を前記放音装置に放音させる放音制御部と
     を具備する放音システム。
    A reception unit that accepts input from the user;
    A sound emitting device that emits sound;
    A sound emission system comprising: a sound emission control unit that causes the sound emission device to emit an acoustic component representing identification information of related information related to a response to an input received by the reception unit.
  9.  前記放音制御部は、前記応答を表す応答音声と前記音響成分とを前記放音装置に放音させる
     請求項8の放音システム。
    The sound emission system according to claim 8, wherein the sound emission control unit causes the sound emission device to emit a response voice representing the response and the acoustic component.
  10.  前記受付部が受付けた入力を表す入力データを応答サーバに送信する送信部と、
     前記入力データが表す入力に対する応答を表す前記応答音声と、当該応答に関する関連情報の識別情報を表す前記音響成分とを表す音響データを前記応答サーバから受信する受信部とを具備し、
     前記放音制御部は、前記受信部が受信した音響データに応じて前記放音装置に放音させる
     請求項9の放音システム。
    A transmission unit that transmits input data representing an input accepted by the reception unit to a response server;
    A receiver that receives from the response server acoustic data representing the response voice representing a response to an input represented by the input data and the acoustic component representing identification information of related information related to the response;
    The sound emission system according to claim 9, wherein the sound emission control unit causes the sound emission device to emit sound according to acoustic data received by the reception unit.
  11.  前記識別情報を生成する識別情報生成部を具備し、
     前記送信部は、前記入力データと、前記識別情報生成部が生成した識別情報とを前記応答サーバに送信する
     請求項10の放音システム。
    Comprising an identification information generating unit for generating the identification information;
    The sound emission system according to claim 10, wherein the transmission unit transmits the input data and the identification information generated by the identification information generation unit to the response server.
  12.  前記利用者からの入力は、当該利用者が発話した音声である
     請求項8から請求項11の何れかの放音システム。
    The sound emission system according to any one of claims 8 to 11, wherein the input from the user is a voice uttered by the user.
  13.  利用者による入力に対する応答を生成する応答生成部と、
     前記応答生成部が生成した応答に関する関連情報を生成する関連情報生成部と、
     前記関連情報生成部が生成した関連情報の識別情報を表す音響成分を表す音響データを、当該音響データに応じて放音する放音システムに対して送信する第1通信制御部と、
     前記放音システムによる音響通信で前記識別情報を受信した端末装置からの情報要求に応じて、当該識別情報に対応する関連情報を当該端末装置に送信する第2通信制御部と
     を具備する情報処理システム。
    A response generator for generating a response to the input by the user;
    A related information generating unit that generates related information related to the response generated by the response generating unit;
    A first communication control unit that transmits acoustic data representing acoustic components representing identification information of the related information generated by the related information generating unit to a sound emitting system that emits sound according to the acoustic data;
    A second communication control unit that transmits related information corresponding to the identification information to the terminal device in response to an information request from the terminal device that has received the identification information by acoustic communication by the sound emission system. system.
  14.  前記関連情報生成部は、前記応答生成部が生成した応答に含まれる単語に対応する関連情報を生成する
     請求項13の情報処理システム。
    The information processing system according to claim 13, wherein the related information generation unit generates related information corresponding to a word included in the response generated by the response generation unit.
PCT/JP2019/019022 2018-05-30 2019-05-14 Sound emission system, information processing system, information providing method, and information processing method WO2019230363A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2018-103922 2018-05-30
JP2018103922A JP7196426B2 (en) 2018-05-30 2018-05-30 Information processing method and information processing system

Publications (1)

Publication Number Publication Date
WO2019230363A1 true WO2019230363A1 (en) 2019-12-05

Family

ID=68697294

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2019/019022 WO2019230363A1 (en) 2018-05-30 2019-05-14 Sound emission system, information processing system, information providing method, and information processing method

Country Status (2)

Country Link
JP (1) JP7196426B2 (en)
WO (1) WO2019230363A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7407758B2 (en) 2021-03-19 2024-01-04 Lineヤフー株式会社 Information processing system and information processing method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005338454A (en) * 2004-05-27 2005-12-08 Toshiba Tec Corp Speech interaction device
JP2007164659A (en) * 2005-12-16 2007-06-28 Absurd Spear:Kk Information distribution system and method using music information
JP2010156741A (en) * 2008-12-26 2010-07-15 Yamaha Corp Service providing device
US20140032220A1 (en) * 2012-07-27 2014-01-30 Solomon Z. Lerner Method and Apparatus for Responding to a Query at a Dialog System
JP2016075890A (en) * 2014-07-29 2016-05-12 ヤマハ株式会社 Terminal equipment, information providing system, information providing method, and program
JP2016206469A (en) * 2015-04-24 2016-12-08 マツダ株式会社 Voice interaction system for vehicle

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005338454A (en) * 2004-05-27 2005-12-08 Toshiba Tec Corp Speech interaction device
JP2007164659A (en) * 2005-12-16 2007-06-28 Absurd Spear:Kk Information distribution system and method using music information
JP2010156741A (en) * 2008-12-26 2010-07-15 Yamaha Corp Service providing device
US20140032220A1 (en) * 2012-07-27 2014-01-30 Solomon Z. Lerner Method and Apparatus for Responding to a Query at a Dialog System
JP2016075890A (en) * 2014-07-29 2016-05-12 ヤマハ株式会社 Terminal equipment, information providing system, information providing method, and program
JP2016206469A (en) * 2015-04-24 2016-12-08 マツダ株式会社 Voice interaction system for vehicle

Also Published As

Publication number Publication date
JP2019207380A (en) 2019-12-05
JP7196426B2 (en) 2022-12-27

Similar Documents

Publication Publication Date Title
CN106537496B (en) Terminal device, information providing system, information presenting method, and information providing method
JP6159048B1 (en) Information management system and terminal device
CN106537497B (en) Information management system and information management method
WO2019230363A1 (en) Sound emission system, information processing system, information providing method, and information processing method
JP2018005071A (en) Terminal device
JP7331645B2 (en) Information provision method and communication system
JP6596903B2 (en) Information providing system and information providing method
KR20190001059A (en) Apparatus for providing artificial intelligence platform and contents service method using same
JP6780305B2 (en) Information processing device and information provision method
JP2017033398A (en) Terminal device
JP6780529B2 (en) Information providing device and information providing system
JP7087745B2 (en) Terminal device, information provision system, operation method of terminal device and information provision method
JP2017191363A (en) Information generation system and information providing method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19811406

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19811406

Country of ref document: EP

Kind code of ref document: A1