WO2019138477A1

WO2019138477A1 - Smart speaker, smart speaker control method, and program

Info

Publication number: WO2019138477A1
Application number: PCT/JP2018/000371
Authority: WO
Inventors: 航洋竹之下; 将仁谷口
Original assignee: 株式会社ウフル
Priority date: 2018-01-10
Filing date: 2018-01-10
Publication date: 2019-07-18

Abstract

The present invention relates to a technique for controlling a smart speaker, and is used in the field of the Internet of Things (IoT). A recognition unit 111 analyses a speech signal, which represents speech that is input in an input unit 14, and recognizes speech content and a user U responsible for producing the speech. A determination unit 112 determines whether or not the speech represented by the recognized speech signal is an utterance indicating an inquiry made by the user U. When the determination unit 112 determines that said speech is an utterance indicating an inquiry made by the user U, an acquisition unit 113 transmits the inquiry to a server device 2, requests a reply to the inquiry from the server device 2, and acquires the reply from the server device 2. When the determination unit 112 determines that said speech is an utterance indicating an inquiry made by the user U, an output control unit 114 outputs artificial speech, indicating that the inquiry has been received, to an output unit 15 before information representing the reply to the inquiry is acquired by the acquisition unit 113 and output to the user U.

Description

Smart speaker, smart speaker control method, and program

The present invention relates to smart speaker control technology, and is used in the field of Internet of Things (IoT) in which speakers are connected to the Internet.

Smart speakers have been developed to accept voice instructions. Patent Document 1 discloses a technique for preventing a replay attack by verifying the uniqueness of an utterance when a hot word which is a reserved word is detected. A "replay attack" is an attack in which an unauthorized user captures audio concerning a hotword spoken by the device owner or the like and replays it to gain unauthorized access to the device.

JP 2017-76117 A

However, in the technology described in Patent Document 1, the smart speaker only performs verification for preventing a replay attack from the detection of the hotword to the answering of the question, and the voice to the user is voiced. Do not output. Therefore, particularly when this response takes time, the user waits for the output of the response without knowing whether or not his / her question has been accepted.

An object of the present invention is to notify a user that a question has been accepted after a voice that is an utterance meaning a question is input and before an answer of the question is output to the user. .

A smart speaker according to claim 1 of the present invention comprises an input unit for inputting a user's voice, an output unit for outputting an artificial voice toward the user, an acquisition unit for acquiring information, and an input via the input unit. When the recognized voice is recognized as an utterance that means a question by the user, the question is accepted prior to obtaining information indicating an answer to the question and outputting it to the user Control means for causing the output means to output the artificial sound indicating.

The smart speaker according to claim 2 of the present invention, in the aspect according to claim 1, has estimation means for estimating the time required for the output means to output the information indicating the answer, and the control The means is a smart speaker characterized in that the artificial voice indicating the required time estimated by the estimation means is output to the output means.

In the smart speaker according to claim 3 of the present invention, in the aspect according to claim 2, the control means receives the voice, which is an utterance meaning the question, and then the estimation means takes the required time. It is a smart speaker characterized by outputting the above-mentioned artificial voice which shows the decided contents to the above-mentioned output means, when predetermined time passes before it presumes.

According to a fourth aspect of the present invention, there is provided the smart speaker according to the second aspect or the third aspect, the processing in which the voice as an utterance meaning the question is input and the information indicating the answer to the question The smart speaker is a smart speaker characterized in that it has history storage means for storing in the storage means a history relating to processing to be acquired in association, and the estimation means estimates the required time using the history.

In the smart speaker according to a fifth aspect of the present invention, in the aspect according to any one of the first to fourth aspects, in the control means, the voice inputted by the input means is inputted in the past. When it is recognized that it is an utterance for requesting a change in the question indicated by voice, acquisition of information indicating the answer is stopped, or outputting of the acquired information to the user is stopped, It is a smart speaker characterized by outputting the above-mentioned artificial sound which shows that change was accepted to the above-mentioned output means.

In the control method according to claim 6 of the present invention, in the case where the recognition means recognizes the user's voice input by the input means, and the voice is an utterance meaning a question by the user The control means causes the output means to output an artificial voice indicating that the question has been accepted, the obtaining means obtains information indicating an answer to the question, and the control means determines the artificial voice And D. outputting the acquired information to the user and outputting the information to the output means.

A program according to a seventh aspect of the present invention is a computer program that recognizes the voice of the user inputted by the input means, and the voice recognizes that the voice is an utterance meaning a question by the user. Outputting an artificial voice indicating the acceptance of the question to the output means, acquiring information indicating an answer to the question, and directing the acquired information to the user after the artificial voice is output And a program for causing the output means to execute.

According to the invention of the present application, it is disclosed that the user is informed that the question has been accepted between the time when the voice that is the utterance meaning the question is input and the answer of the question is output to the user. Can.

The figure which shows the structure of the smart speaker system 9 which concerns on this embodiment. The figure which shows an example of a structure of the smart speaker 1. FIG. FIG. 2 is a view showing a database etc. stored in a storage unit 12; The figure which shows the functional structure of the smart speaker 1. FIG. The flowchart which shows the flow of operation | movement of the smart speaker 1. FIG. The flowchart which shows the flow of operation | movement of the smart speaker 1. FIG. The flowchart which shows the flow of operation | movement of the smart speaker 1. FIG. The flowchart which shows the flow of operation of smart speaker 1 in a modification.

Reference Signs List 1 smart speaker 11 control unit 111 recognition unit 112 determination unit 113 acquisition unit 114 output control unit 115 history storage unit 116 estimation unit 12 storage unit 121 history unit 121 history DB, 122: required time DB, 13: communication unit, 14: input unit, 15: output unit, 2: server apparatus, 3: communication line, 9: smart speaker system.

Embodiment
<Overall Configuration of Smart Speaker System>
FIG. 1 is a diagram showing the configuration of a smart speaker system 9 according to the present embodiment. The smart speaker system 9 has a smart speaker 1, a server device 2, and a communication line 3 for communicably connecting these. The smart speaker system 9 may have a plurality of each of the smart speaker 1, the server device 2, and the communication line 3.

The smart speaker 1 is a device that inputs the voice of the user U with a microphone or the like and outputs artificial voice with a dynamic speaker, an electrostatic speaker or the like, and is an information processing device called a so-called smart speaker or AI speaker.

The server device 2 is an information processing device that generates an answer to a question requested from the smart speaker 1 via the communication line 3 and transmits the generated answer to the smart speaker 1.

The communication line 3 is a line that communicably connects the smart speaker 1 and the server device 2 and is, for example, the Internet or the like.

In the smart speaker system 9 shown in FIG. 1, the smart speaker 1 receives the voice of the user U and executes voice recognition processing, and when the voice is an utterance meaning a question, the smart speaker 1 via the communication line 3 The contents of the question are transmitted to the server device 2. The server device 2 generates an answer to the received question and transmits it to the smart speaker 1 via the communication line 3. The smart speaker 1 converts the information indicating the response received from the server device 2 into artificial speech and outputs it to the user U.

<Configuration of Smart Speaker>
FIG. 2 is a view showing an example of the configuration of the smart speaker 1. The smart speaker 1 includes a control unit 11, a storage unit 12, a communication unit 13, an input unit 14, and an output unit 15.

The control unit 11 includes a central processing unit (CPU), a read only memory (ROM), and a random access memory (RAM), and a computer program (hereinafter, simply referred to as a program) stored in the ROM and the storage unit 12 by the CPU. To control each part of the smart speaker 1 by reading out and executing.

The communication unit 13 is a communication circuit connected to the communication line 3 by wire or wirelessly. The smart speaker 1 exchanges information with the server device 2 connected to the communication line 3 by the communication unit 13.

The input unit 14 is a microphone or the like that collects voices generated in the space around the user U, and sends an audio signal indicating the collected voices to the control unit 11. The input unit 14 is an example of an input unit that inputs a user's voice.

The output unit 15 is a dynamic speaker, an electrostatic speaker, or the like, and outputs sound to the space around the user U in accordance with the signal instructed by the control unit 11. For example, when notifying the user U of the answer to the question of the user U by means of words, the control section 11 converts text data obtained by converting the contents of the answer into text data of artificial voice, and sounds corresponding to the waveform data Are instructed to the output unit 15 to output. The output unit 15 outputs the sound instructed to the control unit 11. Therefore, the output unit 15 is an example of an output unit that outputs artificial speech to the user.

The storage unit 12 is a large-capacity storage unit such as a solid state drive or a hard disk drive, and stores various programs, data, and the like read by the CPU of the control unit 11.

The storage unit 12 also stores a history DB 121 and a required time DB 122. The history DB 121 is a database that stores the history of questions received from the user U. The required time DB 122 is a database that stores the time required to output an answer to the received question for each predetermined item.

FIG. 3 is a view showing a database stored in the storage unit 12. The history DB 121 shown in FIG. 3A is a server for obtaining a question time which is a time when a question is received, a type of the question, an answer time which is a time when an answer to the question is outputted, The device 2 and the communication path exchanged are stored in association with each other.

When the control unit 11 of the smart speaker 1 recognizes the voice input from the user U and recognizes that the voice is an utterance that means a question by the user U, information indicating the content of the question is sent to the server device 2 Send. And if the response corresponding to the question is acquired from the server apparatus 2, the control part 11 of the smart speaker 1 will memorize | store the various information mentioned above in log | history DB121.

In the history DB 121 shown in FIG. 3A, the type of question is, for example, a question on weather, a question on traffic condition, a question on news, and the like. For example, the server device 2 for which the smart speaker 1 requests an answer to a question may be separately determined for each type of the question.

Further, in the history DB 121 shown in FIG. 3A, the communication path is via each when transmitting a query from the smart speaker 1 to the server device 2 and when transmitting an answer from the server device 2 to the smart speaker 1. Information on various devices such as a modem and a personal computer. In this communication path, for example, information is described in which eight devices are involved in relay in a route for transmitting a question, and seven devices are involved in a route for transmitting an answer.

The content stored in the history DB 121 shown in FIG. 3A is an example of a history relating to a process in which the voice of a question is input and a process for acquiring information indicating an answer to the question. It is not limited to what is shown in). For example, although the history DB 121 stores the question type and the communication path in association with each other, either one of them may not be necessary.

The required time DB 122 shown in FIG. 3B associates and stores the type of question and the required time for answer, which is the time required to answer the question of that type. The required time DB 122 describes, for example, that it takes 50 seconds to answer a question about weather, and 90 seconds to answer a question about traffic conditions.

Further, the required time DB 122 shown in FIG. 3C stores the communication path characteristic indicating the characteristic of the communication path in association with the required time for communication, which is the time required to communicate with the communication path. . The required time DB 122 describes, for each communication path characteristic such as, for example, the number and type of modems passed through, the minimum time required for communication under the communication path characteristic.

The required time DB 122 is not limited to any one of the required time DB 122 shown in FIGS. 3B and 3C, and both may be stored in the storage unit 12. In addition, a table in which the required time for other items is described may be included in the required time DB 122.

<Functional Configuration of Smart Speaker>
FIG. 4 is a diagram showing a functional configuration of the smart speaker 1. The control unit 11 of the smart speaker 1 reads and executes the program stored in the storage unit 12 to thereby recognize the recognition unit 111, the determination unit 112, the acquisition unit 113, the output control unit 114, the history storage unit 115, and estimation. It functions as the unit 116. In FIG. 4, the communication line 3 and the communication unit 13 are omitted.

The recognition unit 111 analyzes a voice signal indicating the voice input by the input unit 14 and recognizes the content of the voice. For example, a hidden Markov model or the like is applied to the analysis of the audio signal. When analyzing the voice signal, the recognition unit 111 collates the voice signal with the voice signals of one or more users registered in advance in the storage unit 12 to obtain the main voice. The user U may be identified.

The determination unit 112 determines whether the voice indicated by the voice signal recognized by the recognition unit 111 is an utterance meaning a question by the user U. The determination unit 112 detects, for example, a hotword determined to be uttered prior to a question, such as an “OK computer” or the like. Then, the determination unit 112 determines that the voice following the hot word is “an utterance meaning a question”.

In addition, the determination unit 112 performs morphological analysis on the sentence recognized from the speech by the recognition unit 111, and detects, for example, an interrogative word in the sentence to determine whether the speech is "an utterance meaning a question". It may be determined.

When the determination unit 112 determines that the voice mentioned above is an utterance that means a question by the user U, the acquisition unit 113 transmits the content of the question generated by the voice recognition process to the server device 2 and An answer to the question is requested to the server device 2, and the answer is acquired from the server device 2.

When the determination unit 112 determines that the above-described voice is an utterance that means a question by the user U, the output control unit 114 acquires information indicating an answer to the question and acquires the information indicating the answer to the question to the user U. Prior to the output, an artificial voice indicating that the question has been accepted is output to the output unit 15. Then, after notifying the user U that the question has been received, the output control unit 114 converts the information into artificial voice and causes the output unit 15 to output the information when the information indicating the answer is received from the acquisition unit 113. .

When the acquisition unit 113 acquires from the server device 2 an answer corresponding to the question of the user U, the history storage unit 115 receives the question time at which the question was received, the type of the question, and the answer time at which the answer to the question is output. And the communication path used to obtain the answer are stored in the history DB 121 in association with each other. In other words, an example of the history storage unit that associates the history related to the process in which the voice that is the utterance meaning the question is input and the process of acquiring the information indicating the answer to the question It is.

The estimation unit 116 estimates the time (required time) required for the output unit 15 to output information indicating an answer to the received question of the user U. That is, the estimation unit 116 is an example of the estimation unit in the present invention.

The estimation unit 116 illustrated in FIG. 4 estimates the required time using the history DB 121 and the required time DB 122. That is, the estimation unit 116 illustrated in FIG. 4 is an example of an estimation unit that estimates the required time using a history.

In the estimation unit 116, when the recognition unit 111 receives an input of the user's voice as the smart speaker 1, and the determination unit 112 recognizes that the voice is an utterance meaning a question, the contents of the question are analyzed and the question is Identify the type and identify the time when the question was accepted. Then, the estimation unit 116 refers to the history DB 121 to extract, for example, a history of the same or similar type as the specified type, and from among these, the history of the same or similar time zone as the time zone to which the specified time belongs. Extract further.

When the history is extracted, the estimation unit 116 performs statistical processing on this history, for example. Then, the estimation unit 116 determines whether or not there is a difference between the history and the type and time of the identified question based on the result of the statistical processing and the required time DB 122, which exceeds the threshold. If there is a difference, the estimation unit 116 estimates the required time by correcting the difference using the required time for each item stored in the required time DB 122. That is, the estimation unit 116 uses the time required for each item (the required time for an answer, the required time for communication, etc.) and the history of the required time in the past to take the required time for answering the received question. presume.

Information indicating the estimated required time is sent to the output control unit 114. The output control unit 114 causes the output unit 15 to output an artificial voice indicating the required time estimated by the estimation unit 116.

<Operation of smart speaker>
FIG. 5 is a flowchart showing the flow of the operation of the smart speaker 1. FIG. 5 shows a flow of operation from the input of the voice meaning the question of the user U to the control unit 11 of the smart speaker 1 to the output of the answer to the question.

The control unit 11 controls the input unit 14 to receive input of the voice of the user U (step S101). Then, the control unit 11 executes voice recognition processing on the input voice (step S102), and determines whether the voice is an utterance meaning a question by the user U (step S103).

When it is determined that the input voice is not an utterance meaning a question by the user U (step S103; NO), the control unit 11 executes another process (step S400) and ends the process.

On the other hand, when it is determined that the input voice is an utterance meaning a question by the user U (step S103; YES), the control unit 11 performs a question process of requesting the server apparatus 2 to answer the question and acquiring it. (Step S200). Further, in this case, the control unit 11 estimates a required time until the answer to the received question is output, and executes an estimation process of outputting information of the required time to the user U as an artificial voice (step S300). . Question processing and estimation processing are performed in parallel.

FIG. 6 is a flowchart showing the flow of the operation of the smart speaker 1. FIG. 6 shows the flow of operation of the above-described question processing.

The control unit 11 transmits information indicating the content of the question of the user to the server device 2 and requests an answer to the question (step S201). Then, the control unit 11 determines whether or not the requested response has been acquired from the server device 2 (step S202).

While it is determined that an answer has not been acquired (step S202; NO), the control unit 11 continues this determination. If it is determined that an answer has been obtained (step S202; YES), the control unit 11 causes the output unit 15 to output the obtained answer (step S203), and stores the history of the question and the answer (step S204).

FIG. 7 is a flowchart showing the flow of the operation of the smart speaker 1. FIG. 7 shows the flow of operation of the above-described estimation process.

The control unit 11 determines whether or not a predetermined time (predetermined time) has elapsed after receiving the question (step S301). When it is determined that the predetermined time has elapsed (step S301; YES), for example, the control unit 11 outputs a predetermined sentence (fixed form sentence) such as "I received a question. Please wait for a while" using artificial speech. The output is made to 15 (step S305), and the process ends. As a result, if the estimation of the required time is not completed beyond the predetermined time, the smart speaker 1 notifies the user U only that the question has been accepted without notifying the specific required time.

On the other hand, when it is determined that the predetermined time has not elapsed (step S301; NO), the control unit 11 estimates the required time until the response (step S302). Then, the control unit 11 determines whether the estimation is completed (step S303).

When it is determined that the estimation is not completed (step S303; NO), the control unit 11 returns the process to step S301. On the other hand, when it is determined that the estimation is completed (step S303; YES), the control unit 11 causes the output unit 15 to output a sentence indicating the estimated required time to the output unit 15 (step S304), and ends the process. . That is, when the required time is estimated within the predetermined time, the smart speaker 1 notifies the user U of the required time.

For example, when the user U says "Tell me the weather in Hawaii next week" toward the input unit 14 of the smart speaker 1, the control unit 11 of the smart speaker 1 receives an input of this audio signal to recognize voice. , It is determined whether this is an utterance that means a question. In the smart speaker system 9, the user U may be determined to pronounce the hot word prior to the question, and the control unit 11 of the smart speaker 1 described above from the result of the language analysis of the voice uttered by the user U It may be determined.

When the control unit 11 determines that the input voice is an utterance meaning a question, the control unit 11 sends the contents of the question to the server device 2 to request an answer, and the time required to acquire the answer presume. Then, when the estimation is completed within a predetermined time, the estimated required time is output by artificial speech.

For example, if the smart speaker 1 estimates that the required time is "3 seconds" before answering the above-mentioned question, it outputs an artificial voice "It's Hawaii weather next week. Please wait for about 3 seconds" Output from 15. And then, when the information showing the answer is acquired from the server device 2, the smart speaker 1 voices the information, for example, an artificial voice such as "The weather in Hawaii next week is rainy on Tuesday and Friday and the other is fine." Are output from the output unit 15.

By the above operation, when the estimation of the required time is completed within a predetermined time, the user U will know that the question has been accepted and how long it should wait for the answer. In addition, even if the estimation of the required time is not completed within the predetermined time, the user U knows that the question he has asked is accepted, so it is determined whether to wait for an answer or to repeat the question again can do.

<Modification>
The above is the description of the embodiment, but the contents of this embodiment can be modified as follows. Also, the following modifications may be combined.

<Modification 1>
In the embodiment described above, the control unit 11 of the smart speaker 1 functions as the estimation unit 116 that estimates the required time required for the output unit 15 to output the received information indicating the answer to the question of the user U. When it is not necessary to notify the user U of the required time, it may not function as the estimation unit 116.

<Modification 2>
In the above-described embodiment, the control unit 11 determines the content when a predetermined time has elapsed from the input of the voice, which is an utterance meaning a question, to the estimation of the time required to answer the question. Although the artificial voice which shows these is output to the output part 15, multiple types may be defined as this predetermined time. For example, when the smart speaker 1 is used by a plurality of people, the predetermined time may be set for each user U. Moreover, this predetermined time may be defined for every time slot which accepts a question.

Further, the control unit 11 may not monitor whether or not a predetermined time has elapsed. In this case, the control unit 11 may wait for the completion of the estimation process until the required time is estimated. The control unit 11 may stop this estimation process when, for example, the user U receives an input of a new voice that means withdrawal of a question.

<Modification 3>
In the embodiment described above, the control unit 11 of the smart speaker 1 stores the question time, the type, the answer time, and the communication path in the history DB 121 in association with each other when acquiring the answer corresponding to the question. Although it functions as the history storage unit 115, the items stored in the history DB 121 are not limited to this. In addition, the control unit 11 may not function as the history storage unit 115. That is, the smart speaker 1 may not store the history of processing from the question to the answer.

<Modification 4>
In the embodiment described above, the control unit 11 of the smart speaker 1 determines the sound indicated by the sound signal recognized by the recognition unit 111 as the determination unit 112 that determines whether the speech is an utterance meaning a question by the user U. Although functioning, the determination unit 112 may determine whether the above-described voice is an utterance for requesting a change of a question indicated by the voice input by the user U in the past.

FIG. 8 is a flowchart showing the flow of the operation of the smart speaker 1 in the modification. Descriptions of processes shown in FIG. 8 that are common to FIG. 5 will be omitted.

When it is determined that the input voice is not an utterance that means a question by the user U (step S103; NO), the control unit 11 is an utterance for which a change of a question whose voice is indicated by a voice input in the past is requested. It is determined whether there is any (step S111).

When it is determined that the above-described voice is not an utterance for requesting a change in question (step S111; NO), the control unit 11 advances the process to step S400 described above.

On the other hand, when it is determined that the above-described voice is an utterance for requesting a change in question (step S111; YES), the control unit 11 cancels the question processing already started for the question (step S112). , And stop the estimation process (step S113). Steps S112 and S113 may be processed in parallel.

When the above-described two processes are canceled, the control unit 11 changes the content of the question according to the input voice, and transmits the content to the server device 2 (step S114). Then, the control unit 11 executes a question process of requesting and acquiring an answer to the changed question (step S200). If the question processing that has already started to execute is before acquiring information indicating an answer, the control unit 11 cancels the acquisition. Moreover, if the question process which has already started execution is after acquiring the information which shows an answer, the control part 11 stops outputting the acquired information toward the user U.

In addition, the control unit 11 estimates a required time until the response to the changed question is output, and executes an estimation process of outputting information of the required time to the user U as an artificial voice (step S300). The newly started estimation process informs the user U that at least a change in question has been received, and, further, if the estimation is completed within a predetermined time, the time required to obtain an answer to the post-change question Is notified to the user U.

Therefore, when the control unit 11 recognizes that it is an utterance for requesting a change of a question indicated by a voice input in the past, the control unit 11 stops obtaining information indicating an answer or gives the user this acquired information to the user It functions as an output control unit 114 that causes the output unit 15 to output artificial voice indicating that the change has been received, while stopping outputting the data toward the output.

For example, after the user U issues a voice (1st voice) saying "Tell the weather in Hawaii next week" to the input unit 14 of the smart speaker 1, "I made a mistake. Tell me the weather in London next week. When the voice (second voice) is emitted, the control unit 11 of the smart speaker 1 receives the input of the voice signal, recognizes the second voice, and determines whether this is an utterance for requesting a change in question. .

In the smart speaker system 9, when the user U pronounces a hot word "mistaken" after a question, the control unit 11 of the smart speaker 1 follows the voice as "a speech for asking for a change in question". It may be determined that In addition, the control unit 11 of the smart speaker 1 may make the above-mentioned determination from the result of linguistic analysis of the speech uttered by the user U following the question.

When the control unit 11 determines that the input voice is an utterance for requesting a change in question, the control unit 11 sends the content of the change to the server device 2 to request an answer. As a result, the process of requesting an answer to the question shown in the first voice is discontinued.

In addition, the control unit 11 cancels the process of estimating the required time for acquiring the answer to the question before the change, and newly performs the process of estimating the required time for acquiring the answer for the changed question. Start. When the estimation is completed within a predetermined time, the estimated required time is output by artificial speech.

For example, if Smart Speaker 1 estimates the time required to be "3 seconds" before answering the above-mentioned changed questions, "The correction has been accepted. It is the weather in London next week. Another 3 seconds or so. The artificial voice "Please wait" is output from the output unit 15. Then, after that, when information indicating a response is acquired from the server device 2, the smart speaker 1 voices the information and outputs it from the output unit 15.

By the above operation, when the control unit 11 of the smart speaker 1 makes an utterance to change the content the user U has asked in the past, the control unit 11 inputs a voice indicating the utterance and cancels the request for an answer to the server device 2 Since the estimation of the required time is stopped and the request for the answer and the estimation of the required time are newly started with the changed question, the user U is informed of the result of the answer before the change and the estimated result of the required time I have not.

<Modification 5>
The process performed by the control unit 11 of the smart speaker 1 can be considered as a control method of the smart speaker 1. That is, according to the present invention, when the recognition means recognizes the voice of the user inputted by the input means, and when the voice recognizes that the speech is an utterance meaning a question by the user, the control means is A step of causing the output means to output an artificial voice indicating that a question has been accepted; a step of obtaining information indicating a response to the question; and a step of the control means outputting the artificial voice. And C. outputting the obtained information to the user and outputting the information to the output means.

<Modification 6>
The program executed by the control unit 11 of the smart speaker 1 is a computer-readable recording medium such as a magnetic recording medium such as a magnetic tape and a magnetic disk, an optical recording medium such as an optical disc, a magneto-optical recording medium, and a semiconductor memory. Can be provided as stored in the Also, this program may be downloaded via a communication line such as the Internet. In addition, as a control means illustrated by the control part 11 mentioned above, various apparatuses other than CPU may be applied, for example, a processor for exclusive use etc. are used.

Claims

Input means for inputting user's voice;
Output means for outputting artificial speech to the user;
Acquisition means for acquiring information;
When it is recognized that the voice inputted by the input means is an utterance that means a question by the user, prior to acquiring information indicating an answer to the question and outputting it to the user, Control means for causing the output means to output the artificial voice indicating that a question has been accepted;
Speaker with.
The estimation means for estimating the time required for the output means to output the information indicating the answer;
The smart speaker according to claim 1, wherein the control means causes the output means to output the artificial voice indicating the required time estimated by the estimation means.
The control means, when the predetermined time has elapsed before the estimation means estimates the required time after the voice as the utterance meaning the question is input, the artificial voice indicating the determined content is The smart speaker according to claim 2, wherein the output means is made to output.
History storage means for storing in the storage means a process in which the voice as an utterance meaning the question is input and a history regarding a process for acquiring the information indicating the answer to the question in association with each other;
The smart speaker according to claim 2 or 3, wherein the estimation means estimates the required time using the history.
When the control means recognizes that the voice inputted by the input means is an utterance for requesting a change of the question indicated by a voice inputted in the past, acquiring information indicating the answer The method according to any one of claims 1 to 4, further comprising: stopping outputting the acquired information to the user, or outputting the artificial voice indicating that the change has been accepted, to the output unit. Smart speaker according to any one of the above.
The recognition means recognizes the user's voice input by the input means;
If the control means recognizes that the voice is an utterance representing a question by the user, the control means causes the output means to output an artificial voice indicating that the question has been accepted;
Acquisition means acquires information indicating an answer to the question;
Causing the control means to output the obtained information to the user after the artificial voice is output, to the output means;
Control method of the smart speaker having.
On the computer
Recognizing the user's voice input by the input means;
Causing the output means to output an artificial voice indicating that the question has been accepted, when the voice is recognized as an utterance that means a question by the user;
Obtaining information indicating an answer to the question;
Allowing the output means to output the acquired information to the user after the artificial voice is output;
A program to run a program.