WO2020100532A1 - Information processing device, information processing method, and information processing program - Google Patents

Information processing device, information processing method, and information processing program Download PDF

Info

Publication number
WO2020100532A1
WO2020100532A1 PCT/JP2019/041218 JP2019041218W WO2020100532A1 WO 2020100532 A1 WO2020100532 A1 WO 2020100532A1 JP 2019041218 W JP2019041218 W JP 2019041218W WO 2020100532 A1 WO2020100532 A1 WO 2020100532A1
Authority
WO
WIPO (PCT)
Prior art keywords
question
image
correct answer
information
robot apparatus
Prior art date
Application number
PCT/JP2019/041218
Other languages
French (fr)
Japanese (ja)
Inventor
アンドリュー シン
Original Assignee
ソニー株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ソニー株式会社 filed Critical ソニー株式会社
Publication of WO2020100532A1 publication Critical patent/WO2020100532A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • the present disclosure relates to an information processing device, an information processing method, and an information processing program.
  • Information processing using machine learning is used in various technical fields, and with the development of deep learning, household agents and robots are able to learn and identify many types of objects. For example, systems have been implemented that answer questions about images.
  • the question about the image is answered using the information about the image and the question.
  • the conventional technology is not always able to make an appropriate response to a question about an image.
  • the number of identifiable answers (classes) is limited, and other than that answer cannot be identified.
  • it is difficult to appropriately make a response to a question regarding an image unless it is within the range of a correct answer prepared in advance.
  • the present disclosure proposes an information processing device, an information processing method, and an information processing program capable of appropriately responding to a question regarding an image.
  • an information processing device is an image, a question related to the image, and an acquisition unit that acquires a correct answer corresponding to the question, and is acquired by the acquisition unit. And a registration unit that registers the combination of the image, the question, and the correct answer as support information used to determine a response to query information including one image and one question related to the one image. ..
  • FIG. 3 is a diagram illustrating an example of a support information storage unit according to the first embodiment of the present disclosure. It is a figure showing an example of a model information storage part concerning a 1st embodiment of this indication. It is a figure showing an example of a mode information storage part concerning a 1st embodiment of this indication. It is a figure which shows an example of the identification which concerns on this indication.
  • FIG. 20 is a block diagram showing an example of a configuration from input to output according to the present disclosure.
  • 3 is a flowchart showing a procedure of information processing according to the first embodiment of the present disclosure.
  • 5 is a flowchart showing a procedure of correct answer registration processing by a dialog with a user according to the first embodiment of the present disclosure.
  • It is a figure showing an example of information processing concerning a 2nd embodiment of this indication.
  • It is a figure showing an example of a mode information storage part concerning a 2nd embodiment of this indication.
  • FIG. 9 is a flowchart illustrating a procedure of correct answer registration processing by a dialog with a user according to the second embodiment of the present disclosure. It is a figure which shows the structural example of the information processing system which concerns on the modification of this indication. It is a figure showing an example of composition of an information processor concerning a modification of this indication. It is a figure which shows an example of the adjustment process of the camera viewpoint of this indication.
  • FIG. 16 is a block diagram showing an example of a configuration from input to output relating to adjustment of a camera viewpoint of the present disclosure.
  • 7 is a flowchart illustrating a procedure of a camera viewpoint adjustment process of the present disclosure. It is a hardware block diagram which shows an example of a computer which implement
  • First Embodiment 1-1 Outline of information processing according to first embodiment of the present disclosure 1-2. Configuration of robot device according to first embodiment 1-3. Information processing procedure according to the first embodiment 2. Second embodiment 2-1. Overview of information processing according to second embodiment of the present disclosure 2-2. Configuration of robot apparatus according to second embodiment 2-3. Information processing procedure according to the second embodiment 3. Other Embodiments 3-1. Other configuration examples 3-2. Adjustment of camera viewpoint 4. Hardware configuration
  • FIG. 1 is a diagram illustrating an example of information processing according to the first embodiment of the present disclosure.
  • the information processing according to the first embodiment of the present disclosure is realized by the robot device 100 shown in FIG.
  • the robot device 100 is an information processing device that executes information processing according to the first embodiment.
  • the robot apparatus 100 is an information processing apparatus that registers an image, a question associated with the image, and a combination of correct answers corresponding to the question in response to a dialog with a user.
  • the robot apparatus 100 detects an image with a camera (corresponding to the sensor unit 16 in FIG. 3) and a user's utterance (question) with a microphone (corresponding to the input unit 12 in FIG. 3). ..
  • the robot apparatus 100 outputs the detected image and the response corresponding to the user's question by the speaker (corresponding to the output unit 13 in FIG. 3).
  • the robot device 100 may be any device as long as it can realize the processing in the first embodiment.
  • the robot device 100 interacts with a human (user) such as an entertainment robot or a household robot. It may be a robot that does.
  • FIG. 1 shows a case where the robot device 100 as an agent registers a combination of an image, a question, and a correct answer (hereinafter also referred to as “support information”) through a dialogue with the user U1.
  • the robot apparatus 100 acquires an image (step S11).
  • the robot apparatus 100 detects the image IM1 by capturing images of two ice creams (also simply referred to as “ices”) with a camera.
  • an image is input to the robot apparatus 100 through a camera attached to the robot apparatus 100.
  • the image (visual information) acquired through the camera may have various forms, but for example, the camera may detect RGB information as image information.
  • the robot device 100 may acquire the image IM1 detected by the external device from the external device.
  • the user U1 designates the mode of the robot apparatus 100 as the question registration mode in order to register the question in the robot apparatus 100 (step S12).
  • the user U1 makes an input to the robot apparatus 100, which indicates that the question registration mode is designated.
  • the user U1 inputs the command MD1 that specifies the question registration mode registered in the robot device 100 in advance.
  • the user U1 inputs the question registration mode to the robot apparatus 100 by speaking “question” indicating that a question is to be asked.
  • the command MD1 is not limited to “question”, but may be set appropriately such as “attention”, “listen”, “hey XXX (robot name)”.
  • a microphone provided in the robot device 100 receives an input by detecting a user's utterance. As a result, the robot apparatus 100 receives an input designating the question registration mode.
  • the robot apparatus 100 converts voice information into text (character information) using various voice recognition techniques.
  • the robot device 100 may be able to acquire information from a voice recognition server that provides a voice recognition service.
  • the robot device 100 may acquire the character information obtained by converting the voice information from the voice recognition server by transmitting the voice information to the voice recognition server.
  • the robot device 100 is assumed to have a voice recognition function, and the robot device 100 recognizes the user's utterance by appropriately using various conventional techniques and identifies the user who uttered the utterance. The description will be appropriately omitted as the estimation.
  • the robot apparatus 100 may store association information in which the storage unit 120 (see FIG. 3) associates the mode ID with the character information (keyword) for shifting to the mode corresponding to the mode ID. .. Then, the robot apparatus 100 may compare the character string corresponding to the utterance of the user with the character information of the association information, and shift to the mode of the corresponding character information.
  • the mode designation is not limited to voice, and various modes may be used.
  • the operation may be performed by the user operating a button (which may be on hardware or software) provided in the robot apparatus 100 itself to shift to the question registration mode.
  • the robot apparatus 100 changes the mode according to the user's input (step S13).
  • the robot apparatus 100 changes the mode to the question registration mode (see FIG. 6) corresponding to the command “MD1” of the user U1, “question”.
  • the user U1 inputs a question to the robot device 100 (step S14).
  • the user U1 inputs the question QS1 “What color is the right?”
  • the robot device 100 may acquire the image IM1 and the question QS1, and at any timing, may acquire the image IM1 and the question QS1 as long as it can respond.
  • the robot apparatus 100 may acquire an image after inputting a question.
  • the robot apparatus 100 may perform step S11 after step S14.
  • a microphone provided in the robot device 100 receives an input by detecting a user's utterance.
  • the robot apparatus 100 uses various techniques as appropriate to determine whether the question registration is completed. For example, the robot apparatus 100 enters the question registration mode, and after the voice question input from the user U1 starts, if no voice is input at an interval equal to or more than a certain threshold, the robot apparatus 100 ends the question registration mode. Then, the robot apparatus 100 converts the voice question input in the question registration mode into character information, and natural language processing is performed based on the character information. For example, the robot apparatus 100 may determine (estimate) the meaning or content of the question by analyzing the character information by appropriately using a natural language processing technique such as morphological analysis.
  • a natural language processing technique such as morphological analysis.
  • the robot apparatus 100 accepts the question QS1 "What color is right?".
  • the robot apparatus 100 is not limited to a question (closed question) that is answered by “yes”, “no”, or “A, B, or C”, and the like.
  • We accept various types of questions such as open questions.
  • the color of the ice cream package on the right side in the image IM1 shown in FIG. 1 is light purple.
  • the robot device 100 responds to the question QS1 using various techniques.
  • the robot apparatus 100 identifies the correct answer based on the input image IM1 and the input question QS1.
  • the robot apparatus 100 uses the input image IM1 and the question QS1 as query information (hereinafter also simply referred to as “query”) to identify the correct answer corresponding to the query.
  • the robot device 100 outputs the result to the user via a speaker, a monitor, or the like.
  • the robot apparatus 100 performs the identification processing as shown in FIG. 8 using the determined support information. In this case, the robot apparatus 100 uses the correct answer of the support information determined by the identification process as a response.
  • the robot apparatus 100 outputs the correct answer of the support information determined by the identification processing as a correct answer candidate (step S15).
  • the robot device 100 does not have the support information whose correct answer is “light purple” registered. Therefore, the robot apparatus 100 outputs “red”, which is different from the correct answer “light purple”, as the correct answer candidate AA1.
  • the robot apparatus 100 may respond to the question QS1 of the user U1 by appropriately using any technique as long as it can respond to the question QS1 of the user U1.
  • the robot apparatus 100 may recognize an object and make a response using a technique related to object recognition.
  • the user U1 Since the robot device 100 cannot identify (determine) whether or not to register the responded correct answer candidate as the correct answer, the user U1 provides the robot device 100 with information regarding the correct answer. The user U1 inputs, into the robot apparatus 100, a response to the correct answer that the robot apparatus 100 has responded to (step S16). The user U1 inputs a negative reaction to the robot device 100 because the correct candidate AA1 “red” output by the robot device 100 does not correspond to the color of the ice cream package on the right side of the image IM1.
  • the user U1 inputs a negative command NG1 that denies a correct answer candidate registered in advance in the robot apparatus 100. Specifically, the user U1 inputs "correct answer registration mode" to the robot apparatus 100 by speaking "no" indicating that the correct answer candidate is incorrect. It should be noted that the negative command NG1 is not limited to “different”, but may be set appropriately such as “incorrect answer” or “wrong”.
  • a microphone provided in the robot device 100 receives an input by detecting a user's utterance. Thereby, the robot apparatus 100 accepts a negative reaction of the user U1 as an input for designating the correct answer registration mode.
  • the robot apparatus 100 may shift to the correct answer registration mode by a user operating a button provided in the robot apparatus 100 itself to shift to the correct answer registration mode.
  • the robot device 100 may request the user U1 for a response when the robot device 100 cannot determine whether the reaction of the user U1 is positive or negative.
  • the robot device 100 may output a voice that prompts the user U1 to select affirmative or negative, such as "Is the answer correct?
  • the robot apparatus 100 determines that the user U1 is negative when the user U1 responds with “No” or “No”, and affirmative when the user U1 responds with “Yes” or “Good”. May be determined.
  • the robot apparatus 100 compares the negative response information, which is the negative response list stored in the storage unit 120 (see FIG. 3), and the positive response list information, which is the positive response list, with the user response. Therefore, the above determination may be performed.
  • the robot apparatus 100 changes the mode according to the user's input (step S17).
  • the robot apparatus 100 changes the mode to the correct answer registration mode (see FIG. 6) corresponding to the negative command NG1 of “NO” of the user U1.
  • the robot apparatus 100 enters the correct answer registration mode, and waits until the correct answer input from the user U1 is received.
  • the user U1 provides the correct answer to the robot device 100 (step S18).
  • the user U1 inputs the correct answer AS1 “light purple” to the question QS1 “what color is the right?” By speaking “light purple” to the robot apparatus 100.
  • the robot apparatus 100 regards the input up to that point as the “correct answer”. Accept.
  • the robot apparatus 100 registers, as support information, a combination of an image, a question related to the image, and a correct answer corresponding to the question (step S19). For example, the robot apparatus 100 shifts to the support information registration mode, and registers a combination of an image, a question related to the image, and a correct answer corresponding to the question as the support information.
  • the robot apparatus 100 uses the combination of the image IM1, the question QS1, and the correct answer AS1 as shown in the additional registration information RINF1 as the support information (support information SP1) identified by the support information ID “SP1”. register. In this way, the robot apparatus 100 registers the three elements of the input image, the input question, and the correct answer as one set. For example, the robot apparatus 100 stores the support information SP1 in the support information storage unit 141 (see FIG. 4).
  • the robot apparatus 100 registers the correct answer AS1 provided by the user U1 as the correct answer corresponding to the image IM1 and the question QS1 instead of the correct answer candidate AA1 output by the robot apparatus 100 itself.
  • the robot apparatus 100 does not have the support information including the correct answer “light purple”, which is the correct answer AS1, registered therein. .. That is, the robot apparatus 100 did not acquire the concept of “light purple” that is a concept related to color at the time of outputting the correct answer candidate AA1 to the image IM1 and the question QS1. Therefore, the robot apparatus 100 could not respond to the image IM1 and the question QS1 with the correct answer of “light purple”.
  • VQA Voice Question Answering
  • the feature amount of an image and the feature amount of a question are projected on a common space and the correct answer is identified based on the feature amount. That is, in VQA, since the number of identifiable answers is limited (generally within the range of 1000 to 1500), the answers that are not included therein cannot be identified.
  • the robot apparatus 100 can make a response of "light purple" to the input image and question by using the newly added support information SP1 in the response after step S19 in FIG. Becomes As described above, the robot apparatus 100 has an advantage in that, when compared with VQA, not only the answers of 1000 to 1500 or the like, which have a high frequency in the data set, but also new answers can be learned. ..
  • the robot apparatus 100 can output the correct correct answer candidate “light purple” for the image IM1 and the question QS1. That is, the robot apparatus 100 acquires the concept of “light purple”, which is a concept related to color, by additionally registering the support information SP1. In other words, the robot apparatus 100 can acquire the concept regarding the property of the object called ice cream on the right side of the image IM1. In this way, the robot apparatus 100 can continuously acquire new concepts in the dialog with the user, and thus can acquire an unknown concept at the time of providing the robot apparatus 100, for example, a concept corresponding to a new word. Therefore, the robotic device 100 can enable an appropriate response to a question regarding an image.
  • the robot device 100 there is a learning model such as one-shot learning that enables class identification from a small number of samples.
  • the identification is based mainly on the visual element, that is, only the image, and like the robot device 100, the learning of a new concept, particularly a concept such as the property of an object, is appropriately performed. Is difficult.
  • the robot apparatus 100 can acquire any concept corresponding to the image by using the combination of the image, the question related to the image, and the correct answer corresponding to the question as the support information. May allow appropriate responses to questions regarding.
  • the robot device 100 is advantageous in that, when compared with one-shot learning, not only the visual similarity but also learning of different concepts can be performed for the same object or the same image depending on the context given in the question. There is a nature.
  • the robot apparatus 100 can identify the image and the question having the same “answer” by adding the question to the image and considering not only the visual similarity but also the context given from the question. ..
  • the robot apparatus 100 can acquire attributes including various properties and relationships such as color, size, number, and purpose depending on the context from the question even if the images of the same object (target object). it can. In this way, the robot apparatus 100 can acquire attributes including the property of the target object and the relationship between the target object and another target object.
  • the robot apparatus 100 can acquire various concepts including the attribute of the target object in addition to the name of the target object itself. For example, the robot apparatus 100 can acquire a concept regarding a property or a state of an object included in an image. Thus, the robot apparatus 100 can register support information corresponding to concepts such as various attributes from the same image or the same object by repeating the question and answer. As a result, the robot apparatus 100 can acquire knowledge corresponding to the concept of support information.
  • the robot apparatus 100 can acquire a concept regarding the amount, color, temperature, hardness, etc. of the object included in the image.
  • the robot apparatus 100 can also acquire a concept (relative concept) regarding the relationship between the target object included in the image and another target object. For example, when a concept based on a relationship with another object such as “large” or “small” is used as a correct answer, the robot apparatus 100 can also acquire a concept to be related. In this way, the robot apparatus 100 can acquire as many concepts as the number of questions even for images of the same object (object).
  • the robot apparatus 100 can learn a new concept regarding an image through a context obtained from a question and answer session. That is, the robot apparatus 100 can learn a new concept through images and questions.
  • an object (object) in the real world has not only a name (label) but also various attributes.
  • the target object has various attributes such as properties and relationships with other objects.
  • the attributes of an object are often affected by the situation or context.
  • any object (object) has an attribute of size, and the size is a relative concept.
  • the size is a relative concept.
  • the agent or robot recognizes a sample of an object corresponding to the object through a camera or a file.
  • the user may input the information of the object verbally or by characters, but in most cases, the information is limited to a simple label or name of the object.
  • the robot apparatus 100 can acquire not only images but also various concepts that differ depending on the context for one object through a question and answer session with the user, and thus can appropriately respond to a question about the image.
  • Can be The robot apparatus 100 can appropriately add information for learning a new concept to an entity (for example, a computer) other than a human who performs information processing.
  • the robot apparatus 100 can improve the practicality by applying the one-shot learning methodology to the setting of VQA so that a wider range of concepts can be learned.
  • FIG. 2 is a diagram showing another example of information processing according to the first embodiment of the present disclosure. Note that steps S21 to S24 in FIG. 2 are the same as steps S11 to S14 in FIG. 1, so description thereof will be omitted.
  • the robot apparatus 100 that has received the question QS1 "what color is right?" Responds to the question QS1 using various techniques.
  • the robot apparatus 100 uses the input image IM1 and the question QS1 as a query and identifies the correct answer corresponding to the query.
  • the robot apparatus 100 outputs the correct answer of the support information determined by the identification processing as a correct answer candidate (step S25).
  • the robot apparatus 100 determines the support information by the identification processing as shown in FIG. 8 and uses the support information whose correct answer is “light purple” for the response to the question QS1.
  • the robot apparatus 100 outputs the correct answer “light purple” as the correct answer candidate AA2.
  • the user U1 inputs, into the robot apparatus 100, a reaction to the correct answer that the robot apparatus 100 has responded to (step S26).
  • the user U1 inputs a positive reaction to the robot device 100 because the correct answer candidate AA2 “light purple” output by the robot device 100 corresponds to the color of the ice cream package on the right side in the image IM1.
  • the user U1 inputs the affirmative command OK1 for affirming the correct answer candidate registered in advance in the robot apparatus 100. Specifically, the user U1 inputs by inputting a correct answer candidate registration mode in response to the robot apparatus 100 by speaking “correct answer” indicating that the correct answer candidate is correct.
  • the affirmative command OK1 is not limited to “correct answer”, but may be set appropriately such as “matched” or “yes”. For example, when the robot device 100 receives the affirmative command OK1, it shifts to the support information registration mode.
  • the robot apparatus 100 registers, as support information, the combination of the image, the question related to the image, and the correct answer corresponding to the question (step S27).
  • the robot apparatus 100 identifies the combination of the image IM1, the question QS1, and the correct answer candidate AA2, which is the correct answer, as shown in the additional registration information RINF2, by the support information (support information ID “SP1” (support Register as information SP1).
  • the robot apparatus 100 registers the three elements of the input image, the input question, and the correct answer as one set.
  • the robot apparatus 100 stores the support information SP1 in the support information storage unit 141 (see FIG. 4).
  • the robot apparatus 100 registers the correct answer candidate AA2 output by the robot apparatus 100 itself as the correct answer corresponding to the image IM1 and the question QS1. Thereby, the robot apparatus 100 can enable a more appropriate response to the question regarding the image.
  • FIG. 3 is a diagram showing a configuration example of the robot apparatus 100 according to the first embodiment of the present disclosure.
  • the robot apparatus 100 includes a communication unit 11, an input unit 12, an output unit 13, a storage unit 14, a control unit 15, a sensor unit 16, and a drive unit 17.
  • the communication unit 11 is realized by, for example, a NIC (Network Interface Card) or a communication circuit.
  • the communication unit 11 is connected to a network N (Internet or the like) by wire or wirelessly, and transmits / receives information to / from other devices or the like via the network N.
  • a network N Internet or the like
  • the user inputs various operations to the input unit 12.
  • the input unit 12 receives an input from the user.
  • the input unit 12 receives a response to the correct candidate by the user.
  • the input unit 12 accepts a correct answer candidate different from the correct answer candidate by the user.
  • the input unit 12 has a function of detecting voice.
  • the input unit 12 has a microphone that detects voice.
  • the input unit 12 receives a user's utterance as an input.
  • the input unit 12 may receive various operations from the user via buttons or a touch panel provided on the robot apparatus 100.
  • the output unit 13 outputs various information.
  • the output unit 13 has a function of outputting voice.
  • the output unit 13 has a speaker that outputs sound.
  • the output unit 13 outputs correct answer candidates corresponding to the question.
  • the output unit 13 outputs the question.
  • the output unit 13 outputs a question when the user is detected by the sensor unit 16.
  • the output unit 13 outputs a response using the support information (determined support information) determined by the determination unit 156.
  • the output unit 13 outputs a voice requesting a correct answer from the user.
  • the output unit 13 outputs the correct answer included in the decision support information.
  • the output unit 13 may output various operations by displaying various information on a display unit such as a display provided in the robot apparatus 100.
  • the storage unit 14 is realized by, for example, a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory (Flash Memory), or a storage device such as a hard disk or an optical disk.
  • the storage unit 14 includes a support information storage unit 141, a model information storage unit 142, and a mode information storage unit 143.
  • the support information storage unit 141 stores various information regarding support.
  • FIG. 4 is a diagram illustrating an example of the support information storage unit according to the first embodiment of the present disclosure.
  • FIG. 4 shows an example of the support information storage unit 141 according to the first embodiment.
  • the support information storage unit 141 has items such as “support information ID”, “image”, “question”, and “correct answer”.
  • “Support information ID” indicates identification information for identifying support information.
  • "Image” indicates an image registered as support information.
  • FIG. 4 shows an example in which conceptual information such as “IM11” and “IM12” is stored in “image”, in reality, image information, moving image information, or a file path name indicating the storage location thereof. Is stored.
  • “Question” indicates a question registered as support information.
  • FIG. 4 shows an example in which conceptual information such as “QS11” or “QS12” is stored in the “question”, in reality, character information or voice information indicating the question, or a file indicating the storage location thereof.
  • Path name etc. are stored.
  • “Correct answer” indicates a correct answer registered as support information.
  • FIG. 4 shows an example in which conceptual information such as “AS11” or “AS12” is stored in “correct answer”, in reality, character information or voice information indicating the correct answer, or a file indicating the storage location. Path name etc. are stored.
  • the support information (support information SP11) identified by the support information ID “SP11” is a combination of the image IM11, the question QS11, and the correct answer AS11. That is, it is indicated that the support information SP11 includes information on the image IM11, information on the question QS11, and information on the correct answer AS11.
  • the model information storage unit 142 stores information about the model.
  • the model information storage unit 142 stores the model information (model data) learned (generated) by the learning process.
  • FIG. 5 is a diagram illustrating an example of the model information storage unit according to the first embodiment of the present disclosure.
  • FIG. 5 shows an example of the model information storage unit 142 according to the first embodiment.
  • the model information storage unit 142 includes items such as “model ID” and “model data”.
  • Model ID indicates identification information for identifying the model.
  • the model identified by the model ID “M1” corresponds to the model M1 that identifies (determines) the support information corresponding to the query as illustrated in FIG.
  • Model data indicates model data.
  • FIG. 5 shows an example in which conceptual information such as “MDT1” is stored in “model data”, in reality, various information that configures the model, such as information about networks included in the model and functions. included.
  • the mode information storage unit 143 stores information about the mode of the robot device 100.
  • FIG. 6 is a diagram illustrating an example of the mode information storage unit according to the first embodiment of the present disclosure.
  • FIG. 6 shows an example of the mode information storage unit 143 according to the first embodiment.
  • the mode information storage unit 143 has items such as “mode ID”, “mode”, and “flag”.
  • Mode ID indicates information for identifying the mode.
  • “Mode” indicates the content of the mode identified by the mode ID.
  • the “flag” is a flag indicating a mode selected from the settable modes, that is, a mode in the current state.
  • the operation mode in which the value of “flag” is “1” is selected. That is, in FIG. 6, the mode “normal” identified by the mode ID “MO1” is selected, and the mode of the robot apparatus 100 in the current state is the normal mode.
  • the mode identified by the mode ID “MO2” (mode MO2) is the question registration mode. Further, the mode MO2 has a flag of "0", which indicates that the mode is not the selected mode.
  • a program for example, an information processing program according to the present disclosure
  • a program stored in the robot apparatus 100 by a CPU (Central Processing Unit), an MPU (Micro Processing Unit), or the like is a RAM (Random Access Memory). It is realized by executing the above as a work area.
  • the control unit 15 is a controller and may be realized by an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).
  • control unit 15 includes an acquisition unit 151, a determination unit 152, a generation unit 153, a registration unit 154, a learning unit 155, and a determination unit 156, and information described below. Realize or execute processing functions and actions.
  • the internal configuration of the control unit 15 is not limited to the configuration shown in FIG. 3, and may be another configuration as long as it is a configuration for performing information processing described later.
  • the acquisition unit 151 acquires various information.
  • the acquisition unit 151 acquires various types of information from an external information processing device.
  • the acquisition unit 151 acquires various information from the storage unit 14.
  • the acquisition unit 151 acquires the input information received by the input unit 12.
  • the acquisition unit 151 acquires the sensor information detected by the sensor unit 16.
  • the acquisition unit 151 acquires an image, a question related to the image, and a correct answer corresponding to the question.
  • the determination unit 152 makes various determinations.
  • the determination unit 152 determines various information based on the information acquired by the acquisition unit 151.
  • the determination unit 152 determines various information based on the information stored in the storage unit 14.
  • the determination unit 152 appropriately uses various techniques to determine whether the question registration by the user is completed.
  • the determination unit 152 determines (estimates) the meaning and content of the question by analyzing the question (character information) using various techniques related to natural language processing.
  • the determination unit 152 determines whether the input by the user is a question. The determination unit 152 determines whether the input by the user is a question by analyzing the character information obtained by converting the voice information of the user. The determination unit 152 determines whether or not there is a response from the user. The determination unit 152 determines whether or not there is a response from the user in response to the acceptance of the input by the user. When the utterance by the user is detected by the microphone, the determination unit 152 determines that the user has reacted.
  • the determination unit 152 determines whether the user's reaction is positive or negative. For example, the determination unit 152 determines whether the user's reaction is affirmative or negative by analyzing the character information in which the user's reaction (voice information) is converted.
  • the generation unit 153 performs various generations.
  • the generation unit 153 generates various information based on the information acquired by the acquisition unit 151.
  • the generation unit 153 generates various information based on the information stored in the storage unit 14.
  • the generation unit 153 generates an episode in which the query information is the input image detected by the sensor unit 16 and the question input by the user. For example, the generation unit 153 generates an episode including query information that is a combination of the input image and the input question and the support information stored in the support information storage unit 141. For example, the generation unit 153 generates an episode including the support information group stored in the support information storage unit 141 as information for determining the response to the query information.
  • the registration unit 154 performs various registrations.
  • the registration unit 154 registers various information based on the information acquired by the acquisition unit 151.
  • the registration unit 154 registers the information acquired by the acquisition unit 151 in the storage unit 14.
  • the registration unit 154 functions as an image registration unit that registers an image.
  • the registration unit 154 functions as a question registration unit that registers a question.
  • the registration unit 154 functions as a correct answer registration unit that registers a correct answer.
  • the registration unit 154 registers the combination of the image, the question, and the correct answer acquired by the acquisition unit 151 as support information used for determining a response to the query information including the one image and the one question related to the one image. To do.
  • the registration unit 154 registers the three elements of the input image, the input question, and the correct answer as one set. In the example of FIG. 1, the registration unit 154 registers the combination of the image IM1, the question QS1 and the correct answer AS1 as the support information SP1. In the example of FIG. 2, the registration unit 154 registers the combination of the image IM1, the question QS1, and the correct answer candidate AA2, which is the correct answer, as the support information SP1.
  • the learning unit 155 performs various kinds of learning.
  • the learning unit 155 learns various information based on the information acquired by the acquisition unit 151.
  • the learning unit 155 learns various information based on the information stored in the storage unit 14.
  • the learning unit 155 learns (generates) a model.
  • the learning unit 155 learns (generates) a model based on the information acquired by the acquisition unit 151.
  • the learning unit 155 learns (generates) a model based on the information stored in the storage unit 14.
  • the learning unit 155 learns the model using various machine learning technologies. For example, the learning unit 155 learns a model having a network structure as shown in FIG. The learning unit 155 learns a model that identifies support information corresponding to query information including an image and a question. For example, the learning unit 155 learns the model M1 that identifies the support information corresponding to the query information including the image and the question. The learning unit 155 may generate a model by performing a learning process using episodes as a set of learning. The learning unit 155 may generate a model by performing a learning process using each of the episodes EP1 to EP3 as shown in FIG. 7 as a learning set.
  • the learning unit 155 may generate a model by performing a learning process based on various learning methods.
  • the learning unit 155 may generate a model by performing a learning process based on a method related to one-shot learning.
  • the learning unit 155 may generate the model M1 by performing a learning process based on a method related to one-shot learning. Note that the above is an example, and the learning unit 155 may generate a model by any learning method as long as it can generate a model that identifies support information corresponding to query information including an image and a question.
  • the decision unit 156 makes various decisions.
  • the determination unit 156 determines various information based on the information acquired by the acquisition unit 151.
  • the determination unit 156 determines various information based on the information stored in the storage unit 14.
  • the determination unit 156 may make various estimations.
  • the determination unit 156 may estimate the surrounding space as shown in FIG.
  • the determining unit 156 determines one correct answer corresponding to one question of the query information based on the query information and the support information. The determining unit 156 determines one correct answer based on the one image and the one question included in the query information and the image and the one question included in the support information.
  • the determining unit 156 identifies the support information including the correct answer corresponding to the query information, using the episode including the query information and the support information.
  • the determination unit 156 identifies support information including a correct answer corresponding to the query information, and determines the identified support information as correct answer support information including a correct answer corresponding to the query information.
  • the determination unit 156 uses the input image IM1 and question QS1 as query information and identifies the correct answer corresponding to the query information.
  • the determination unit 156 determines the mode.
  • the determination unit 156 changes the mode based on the determined mode.
  • the determination unit 156 compares the character string corresponding to the utterance of the user with the character information of the association information, and shifts to the corresponding character information mode.
  • the determination unit 156 changes the mode based on the input by the user. In the example of FIG. 1, the determination unit 156 changes the mode to the question registration mode corresponding to the command MD1 "question" of the user U1.
  • the deciding unit 156 changes the mode to the correct answer registration mode corresponding to the negative command NG1 that the user U1 says “no”. For example, when the determination unit 156 receives the affirmative command OK1, the determination unit 156 shifts to the support information registration mode.
  • the sensor unit 16 detects predetermined information.
  • the sensor unit 16 has a function as an image capturing unit that captures an image.
  • the sensor unit 16 has a function of an image sensor and detects image information.
  • the sensor unit 16 functions as an image input unit that receives an image as an input.
  • the sensor unit 16 is not limited to the above, and may have various sensors.
  • the sensor unit 16 includes a position sensor, an acceleration sensor, a gyro sensor, a temperature sensor, a humidity sensor, an illuminance sensor, a pressure sensor, a proximity sensor, a sensor for acquiring biological information such as odor, sweat, heartbeat, pulse and brain wave. It may have various sensors. Further, the sensor for detecting the above various information in the sensor unit 16 may be a common sensor or may be realized by different sensors.
  • the drive unit 17 has a function of driving the physical configuration of the robot device 100.
  • the drive unit 17 has a function of driving the joints of the robot 100 such as the neck and hands and feet.
  • the drive unit 17 is, for example, an actuator.
  • the driving unit 17 may have any configuration as long as the robot apparatus 100 can realize a desired operation.
  • the drive unit 17 may have any configuration as long as it can drive the joints of the robot apparatus 100, move the positions, and the like.
  • the drive unit 17 drives the tracks and tires.
  • the drive unit 17 drives the joint of the neck of the robot apparatus 100 to change the viewpoint of the camera provided on the head of the robot apparatus 100.
  • the drive unit 17 drives the joint of the neck of the robot apparatus 100 so as to capture the image in the direction determined by the determination unit 156, thereby changing the viewpoint of the camera provided on the head of the robot apparatus 100. change. Further, the drive unit 17 may change only the orientation of the camera or the image pickup range. The drive unit 17 may change the viewpoint of the camera.
  • FIG. 7 is a diagram showing an example of identification according to the present disclosure.
  • three episodes EP1 to EP3 are illustrated, but the episodes EP1 to EP3 are examples for explaining the identification process and the learning process, and the robot device 100 uses various episodes. May be.
  • FIG. 8 is a diagram showing an example of a learning network structure according to the present disclosure.
  • the robot apparatus 100 performs the identification process on the episode EP1 including the query QINF1 and the support information SP11 to SP13.
  • the query QINF1 includes an image IM15 showing a balloon and a question QS15 “What is the color of the balloon?”.
  • the support information SP11 includes an image IM11 showing two ice creams, a question QS11 “what color is the ice cream on the right side?”, And a correct answer AS11 “light purple”.
  • the support information SP12 includes an image IM12 showing a balloon, a question QS12 "What is this?", And a correct answer AS12 "balloon”.
  • the support information SP13 includes an image IM13 showing the sky and three balloons, a question QS13 "What color is the sky?”, And a correct answer AS13 "blue”.
  • the robot apparatus 100 identifies the support information corresponding to the query QINF1 among the support information SP11 to SP13. For example, the robot apparatus 100 identifies the support information corresponding to the query QINF1 using the model having the network structure shown in FIG.
  • the robot apparatus 100 projects the feature amount of the image and the feature amount of the question included in the query or the support information onto a common space and compares the distances to obtain the support information corresponding to the query. Identify.
  • the processing group PS1 in FIG. 8 corresponds to the above identification processing.
  • the robot apparatus 100 performs processing corresponding to the partial processing PT1, the partial processing PT2, and the processing group PS1 including distance comparison and identification.
  • the robot apparatus 100 learns a model (identification model) that performs the processing group PS1 in FIG.
  • the identification model is the model M1.
  • the robot device 100 performs a query-related process by the partial process PT1.
  • the robot apparatus 100 extracts the feature amount (for example, vector) of the image (input image) included in the query.
  • the robot apparatus 100 extracts the feature amount of the input image (hereinafter also referred to as “image feature amount”) by inputting the input image into a network for extracting the image feature amount (hereinafter also referred to as “image feature extraction network”).
  • image feature extraction network a network for extracting the image feature amount
  • the robot apparatus 100 inputs an input image into the image feature extraction network and outputs the image feature amount of the input image to the image feature extraction network.
  • the robot apparatus 100 causes the image feature extraction network to output a vector indicating the image feature amount.
  • the robot apparatus 100 extracts the image feature amount of the image IM15 by inputting the image IM15 to the image feature extraction network.
  • the robot apparatus 100 also extracts the feature amount of the question (input question) included in the query.
  • the robot apparatus 100 extracts the feature amount of the input question (hereinafter also referred to as “quest feature amount”) by inputting the input question into a network for extracting the question feature amount (hereinafter also referred to as “question feature extraction network”).
  • request feature amount a feature amount of the input question
  • questions feature extraction network a network for extracting the question feature amount
  • the robot apparatus 100 inputs an input question into the question feature extraction network and outputs the question feature amount of the input question to the question feature extraction network.
  • the robot apparatus 100 causes the question feature extraction network to output a vector indicating the question feature amount.
  • the robot apparatus 100 extracts the question feature amount of the question QS15 by inputting the question QS15 to the question feature extraction network.
  • the robot apparatus 100 projects the image feature amount extracted from the image included in the query and the question feature amount extracted from the question included in the query onto a common space (for example, an N-dimensional space).
  • the robot apparatus 100 inputs the image feature amount and the question feature amount into a network that projects the image feature amount and the question feature amount onto a common space (hereinafter also referred to as “projection network”).
  • the robot apparatus 100 integrates the image feature amount and the question feature amount, and projects them in the common space.
  • the robot apparatus 100 outputs a feature amount obtained by integrating the image feature amount and the question feature amount (hereinafter, also referred to as “integrated feature amount”) to the projection network by projecting in the common space.
  • the projection network may output an integrated feature amount (vector) obtained by simply connecting the image feature amount (vector) and the query feature amount (vector).
  • the robot apparatus 100 inputs the image feature amount of the image IM15 and the question feature amount of the question QS15 to the projection network, and thereby the integrated feature amount of the image IM15 and the question QS15 (hereinafter, “integrated feature amount”). (Also referred to as feature amount FT15 ”) is output to the projection network.
  • the robot apparatus 100 performs processing related to support information by the partial processing PT2. For example, the robot apparatus 100 performs the partial process PT2 for each piece of support information. In the example of episode EP1 in FIG. 7, the robot apparatus 100 performs the partial process PT2 for each of the support information SP11 to SP13.
  • the robot device 100 extracts the feature amount (for example, vector) of the image (input image) included in the support information.
  • the robot apparatus 100 extracts the feature amount (image feature amount) of the input image by inputting the input image to a network (image feature extraction network) for extracting the image feature amount.
  • the robot apparatus 100 inputs an input image into the image feature extraction network and outputs the image feature amount of the input image to the image feature extraction network.
  • the robot apparatus 100 causes the image feature extraction network to output a vector indicating the image feature amount.
  • the robot apparatus 100 extracts the image feature amount of the image IM11 by inputting the image IM11 of the support information SP11 to the image feature extraction network.
  • the robot apparatus 100 extracts the feature amount of the question (input question) included in the support information.
  • the robot apparatus 100 extracts the feature quantity (quest feature quantity) of the input question by inputting the input question into a network (question feature extraction network) for extracting the question feature quantity.
  • the robot apparatus 100 inputs an input question into the question feature extraction network and outputs the question feature amount of the input question to the question feature extraction network.
  • the robot apparatus 100 causes the question feature extraction network to output a vector indicating the question feature amount.
  • the robot apparatus 100 extracts the question feature quantity of the question QS11 by inputting the question QS11 of the support information SP11 to the question feature extraction network.
  • the robot apparatus 100 projects the image feature amount extracted from the image included in the support information and the question feature amount extracted from the question included in the support information onto a common space (for example, an N-dimensional space).
  • the robot apparatus 100 inputs the image feature amount and the question feature amount to a network (projection network) that projects the image feature amount and the question feature amount on a common space.
  • the robot apparatus 100 integrates the image feature amount and the question feature amount, and projects them in the common space.
  • the robot apparatus 100 causes the projection network to output an integrated feature amount obtained by integrating the image feature amount and the query feature amount by projecting in the common space.
  • a network projection network
  • the robot apparatus 100 inputs the image feature amount of the image IM11 of the support information SP11 and the question feature amount of the question QS11 to the projection network, and thus the integrated feature amount of the image IM11 and the question QS11. (Hereinafter, also referred to as “integrated feature amount FT11”) is output to the projection network.
  • integrated feature amount FT11 integrated feature amount
  • the robot apparatus 100 inputs the image feature amount of the image IM12 and the question feature amount of the question QS12 of the support information SP12 to the projection network, and thereby the integrated feature amount of the image IM12 and the question QS12 (hereinafter, “integrated feature amount FT12”). (Also referred to as “”) is output to the projection network.
  • the robot apparatus 100 inputs the image feature amount of the image IM13 of the support information SP13 and the question feature amount of the question QS13 to the projection network, so that the integrated feature amount of the image IM13 and the question QS13 (hereinafter, “integrated feature amount”).
  • Quantity FT13 a common network (image feature extraction network, question feature extraction network, projection network) is used for the partial processes PT1 and PT2.
  • the robot apparatus 100 compares the distance between the query and each piece of support information based on the information projected in the common space.
  • the robot apparatus 100 compares the distance between the query QINF1 and each of the support information SP11 to SP13 based on the information projected in the common space.
  • the robot apparatus 100 compares the distance between the integrated feature quantity FT15 of the query QINF1 and the integrated feature quantities FT11 to FT13 of each of the support information SP11 to SP13.
  • the robot apparatus 100 identifies the support information corresponding to the query based on the result of comparison of the distance between the query and each support information. For example, the robot apparatus 100 identifies support information that approximates the query as support information that corresponds to the query. For example, the robot apparatus 100 identifies the support information having the shortest distance from the query as the support information corresponding to the query. In the example of the episode EP1 in FIG. 7, the robot apparatus 100 identifies the support information of the integrated feature amount that is the closest to the integrated feature amount FT15 of the query QINF1 as the support information corresponding to the query QINF1. The robot apparatus 100 identifies the support information SP11 of the integrated feature quantity FT11 closest to the integrated feature quantity FT15 of the query QINF1 as the support information corresponding to the query QINF1.
  • the robot apparatus 100 identifies the support information SP11 among the support information SP11 to SP13 as the support information corresponding to the query QINF1. Then, the robot apparatus 100 may determine the correct answer corresponding to the query QINF1 as the correct answer AS11 of the support information SP11. Specifically, the robot apparatus 100 may determine the correct answer corresponding to the query QINF1 to be the correct answer AS15 which is the same “light purple” as the correct answer AS11. Then, the robot apparatus 100 may register, as support information, a combination of the image IM15 and the question QS15 included in the query QINF1 and the correct answer AS15.
  • the robot device 100 may perform the learning process based on the identification result when the correct answer AS15 of the query QINF1 has been acquired.
  • the robot apparatus 100 may learn an identification model including the image feature extraction network, the question feature extraction network, and the projection network described above.
  • the robot apparatus 100 identifies the support information corresponding to the query QINF1 as the support information SP11.
  • the discrimination model may be learned. For such learning, for example, an arbitrary learning method such as back propagation or stochastic gradient descent can be adopted.
  • the robot device 100 performs the identification process on the episode EP2 including the query QINF2 and the support information SP21 to SP23.
  • the query QINF2 includes an image IM25 showing two balloons and a question QS25 asking "how many balloons?”.
  • the support information SP21 includes an image IM21 showing two ice creams, a question QS21 “How many ice creams?”, And a correct answer AS21 “two”.
  • the support information SP22 includes an image IM22 showing a balloon, a question QS22 "what color is this?", And a correct answer AS22 “blue”.
  • the support information SP23 includes an image IM23 showing the sky and three balloons, a question QS23 "How many balloons?", And a correct answer AS23 "3".
  • the robot apparatus 100 identifies the support information corresponding to the query QINF2 among the support information SP21 to SP23. For example, the robot apparatus 100 identifies the support information corresponding to the query QINF2 using the model having the network structure as shown in FIG. The robot apparatus 100 identifies the support information SP21 of the integrated feature quantity closest to the integrated feature quantity of the query QINF2 as the support information corresponding to the query QINF2.
  • the robot apparatus 100 identifies the support information SP21 among the support information SP21 to SP23 as the support information corresponding to the query QINF2. Then, the robot apparatus 100 may determine the correct answer corresponding to the query QINF2 as the correct answer AS21 of the support information SP21. Specifically, the robot apparatus 100 may determine the correct answer AS25 corresponding to the query QINF2 to be the correct answer AS25 which is the same as the correct answer AS21. Then, the robot apparatus 100 may register the combination of the image IM25 and the question QS25 included in the query QINF2 and the correct answer AS25 as support information. Thereby, the robot apparatus 100 can acquire the concept of numbers.
  • the robot apparatus 100 may perform the learning process based on the identification result when the correct answer AS25 of the query QINF2 has been acquired. For example, in the robot apparatus 100, when the support information corresponding to the query QINF2 is identified as the support information SP23 by the identification model as shown in FIG. 8, the support information corresponding to the query QINF2 is identified as the support information SP21. Alternatively, the discrimination model may be learned.
  • the robot apparatus 100 performs the identification process on the episode EP3 including the query QINF3 and the support information SP31 to SP33.
  • the query QINF3 includes an image IM35 showing ice and a question QS35 "What does it feel like to touch?"
  • the support information SP31 includes an image IM31 showing two ice creams, a question QS31 "What does it feel like to touch?", And a correct answer AS31 "cold”.
  • the support information SP32 includes an image IM32 showing a balloon, a question QS32 "what shape is this?", And a correct answer AS32 "round”.
  • the support information SP33 includes an image IM33 showing the sky and three balloons, a question QS33 "what kind of atmosphere?", And a correct answer AS33 "fluffy".
  • the robot apparatus 100 identifies the support information corresponding to the query QINF3 among the support information SP31 to SP33. For example, the robot apparatus 100 identifies the support information corresponding to the query QINF3 using the model having the network structure as shown in FIG. The robot apparatus 100 identifies the support information SP31 of the integrated feature quantity closest to the integrated feature quantity of the query QINF3 as the support information corresponding to the query QINF3.
  • the robot apparatus 100 identifies the support information SP31 among the support information SP31 to SP33 as the support information corresponding to the query QINF3. Then, the robot apparatus 100 may determine the correct answer corresponding to the query QINF3 as the correct answer AS31 of the support information SP31. Specifically, the robot apparatus 100 may determine the correct answer corresponding to the query QINF3 to be the correct answer AS35 that is the same "cold" as the correct answer AS31. Then, the robot apparatus 100 may register the combination of the image IM35 and the question QS35 included in the query QINF3 and the correct answer AS35 as support information. Thereby, the robot apparatus 100 can acquire the concept regarding the impression of the user.
  • the robot apparatus 100 can acquire the concept regarding the temperature of the object.
  • the robot device 100 can acquire various concepts such as hardness as well as temperature as long as the concept is related to the impression of the user.
  • the robot apparatus 100 can acquire the concept of hardness by registering support information including correct answers such as “hard” and “soft”.
  • the robot apparatus 100 may perform the learning process based on the identification result when the correct answer AS35 of the query QINF3 has been acquired. For example, in the robot apparatus 100, when the support information corresponding to the query QINF3 is identified as the support information SP33 by the identification model as shown in FIG. 8, the support information corresponding to the query QINF3 is identified as the support information SP31. Alternatively, the discrimination model may be learned.
  • the robot apparatus 100 may select one piece of support information from the support information group and perform learning using the selected support information as query information.
  • FIG. 9 is a block diagram showing an example of a configuration from input to output according to the present disclosure.
  • the robot apparatus 100 extracts the feature amount (image feature amount) of the image detected by the camera from the query. Further, the robot apparatus 100 extracts the feature amount (question feature amount) of the voice of the question input by the query microphone. Then, the robot apparatus 100 projects the image feature amount and the question feature amount on the common space. Thereby, the robot device 100 completes the preparation of the information corresponding to the query.
  • the robot device 100 generates an episode.
  • the robot apparatus 100 generates an episode by adding the support information.
  • the robot apparatus 100 uses the support information stored in the support information storage unit 141 to generate an episode.
  • the robot apparatus 100 may generate an episode by using a part of the support information stored in the support information storage unit 141, or all the support information stored in the support information storage unit 141. May be used to generate episodes.
  • the robot apparatus 100 performs identification processing based on the episode. For example, the robot apparatus 100 identifies the support information corresponding to the query from the support information of the episode. The robot apparatus 100 determines the identified support information as the support information used to determine the response to the query information.
  • the robot apparatus 100 outputs the correct answer of the determined support information as a correct answer candidate corresponding to the query by the speaker.
  • FIG. 10 is a flowchart showing a procedure of information processing according to the first embodiment of the present disclosure.
  • the robot apparatus 100 acquires an image (step S101).
  • the robot device 100 acquires an image captured by a camera.
  • the robot apparatus 100 acquires a question related to the image (step S102). For example, the robot apparatus 100 acquires the user's question input by the microphone.
  • the robot apparatus 100 acquires the correct answer corresponding to the question (step S103). For example, the robot apparatus 100 outputs the correct answer candidate corresponding to the question, and acquires the correct answer according to the user's reaction to the correct answer candidate. For example, the robot apparatus 100 acquires the correct answer candidate as the correct answer when the user's reaction to the correct answer candidate is positive. For example, the robot apparatus 100 acquires the correct answer provided by the user when the user's reaction to the correct answer candidate is negative.
  • the robot apparatus 100 registers the combination of the acquired image, question, and correct answer as support information (step S104). For example, if the user's reaction to the correct answer candidate is affirmative, the robot apparatus 100 registers the combination of the image, the question, and the correct answer candidate that is the correct answer as support information. For example, if the user's reaction to the correct answer candidate is negative, the robot apparatus 100 registers the combination of the image, the question, and the correct answer provided by the user as support information. For example, the robot apparatus 100 stores the acquired combination of the image, the question, and the correct answer in the support information storage unit 141 in association with the unallocated support information ID.
  • FIG. 11 is a flowchart showing a procedure of correct answer registration processing by a dialog with a user according to the first embodiment of the present disclosure.
  • the robot apparatus 100 receives an input (step S201).
  • the robot apparatus 100 receives a voice uttered by the user as an input.
  • the process of the interaction part with the user will be mainly described. Therefore, in the example of FIG. 11, although not shown, it is assumed that the robot apparatus 100 has acquired the image (input image) before step S201.
  • the robot apparatus 100 captures and acquires an input image with a camera.
  • the robot apparatus 100 determines whether the input is a question (step S202). For example, the robot apparatus 100 determines whether the input is a question by analyzing the character information obtained by converting the voice information of the user. When determining that the input is not a question (step S202; No), the robot apparatus 100 returns to step S201 and repeats the processing.
  • the robot device 100 determines that the input is a question (step S202; Yes), it generates an episode (step S203). For example, the robot device 100 generates an episode in which the input image and question that have been input are used as query information. For example, the robot apparatus 100 generates an episode including query information that is a combination of the input image and the question that have been input and the support information stored in the support information storage unit 141.
  • the robot apparatus 100 performs identification (step S204). For example, the robot apparatus 100 uses the episode to identify the support information including the correct answer corresponding to the query information. For example, the robot apparatus 100 determines the support information used to determine the response to the query information by identifying the support information including the correct answer corresponding to the query information. For example, the robot apparatus 100 determines the support information used for determining the response to the query information, and selects the support information (determined support information) as the information used for determining the response to the query information.
  • the robot device 100 outputs a response (step S205).
  • the robot apparatus 100 outputs a response using the decision support information.
  • the robot apparatus 100 outputs the correct answer included in the decision support information.
  • the robot apparatus 100 determines whether or not there is a user reaction (step S206). For example, the robot apparatus 100 determines whether there is a reaction of the user based on whether or not the input by the user is accepted. For example, the robot apparatus 100 determines that there is a reaction of the user when the utterance by the user is detected by the microphone. When it is not determined that the user has reacted (step S206; No), the robot apparatus 100 returns to step S206 and repeats the processing.
  • the robot apparatus 100 determines whether the user's reaction is affirmative (step S207). For example, the robot device 100 determines whether the user's reaction is positive or negative by analyzing the character information in which the user's reaction (voice information) is converted.
  • the output response is registered as a correct answer (step S208).
  • the output response (correct answer candidate) is set as a correct answer, and is registered as support information combined with the input image and the question included in the query information. That is, the robot apparatus 100 registers, as the support information, the combination of the input image, the question, and the correct answer that is the output response included in the query information.
  • step S207 when the robot device 100 determines that the user's reaction is negative (step S207; No), it requests the user for the correct answer (step S209). For example, when the robot device 100 determines that the reaction of the user is negative, the robot device 100 outputs a voice requesting the correct answer to the user, such as “Please tell me the correct answer”. If the robot device 100 determines that the user's reaction is negative, the robot device 100 may wait until the user's next reaction is detected.
  • the robot apparatus 100 acquires the correct answer (step S210). For example, the robot apparatus 100 acquires the input by the user as the correct answer. For example, the robot apparatus 100 acquires the character information obtained by converting the user's utterance (voice information) as the correct answer.
  • the robot apparatus 100 acquires the character information obtained by converting the user's utterance (voice information) as the correct answer.
  • the robot apparatus 100 registers the acquired correct answer (step S211).
  • the robot apparatus 100 registers support information that is a combination of the acquired correct answer, the input image and the question included in the query information. That is, the robot apparatus 100 registers the combination of the input image, the question, and the acquired correct answer included in the query information as support information.
  • FIG. 12 shows a case where the robot apparatus 100A, which is an agent, asks a question to the user U1 and registers a combination of images, a question, and a correct answer (support information) through a dialogue with the user U1.
  • the robot device 100A acquires an image (step S31).
  • the robot apparatus 100A detects the image IM2 by capturing images of two ice creams with the camera. For example, an image is input to the robot device 100A through a camera attached to the robot device 100A.
  • the robot device 100A detects the presence or absence of a user (step S32).
  • the robot apparatus 100A detects whether a person exists around the robot apparatus 100A in order to ask the user a question. For example, the robot apparatus 100A recognizes (determines) whether or not there is a user who is a partner for the question and answer session. For example, the robot device 100A detects whether or not there is a user around by using a camera.
  • the robot device 100A detects the presence / absence of a user based on the image captured by the camera.
  • the robot device 100A may individually include a camera that captures an image used for a query and a camera that detects the presence or absence of a user.
  • the robot apparatus 100A is not limited to the camera, and may detect or recognize the user by various sensors as long as the presence or absence of the user can be detected.
  • the robot device 100A determines that the user is present in the surroundings because the user U1 is included in the image captured by the camera. In this case, for example, the robot apparatus 100A changes the mode to the question mode (see FIG. 14) because the user U1 is detected.
  • the robot device 100A generates a question (step S33). For example, the robot device 100A generates a question based on the image captured by the camera. In the example of FIG. 12, the robot device 100A generates a question related to the image IM2 based on the acquired image IM2. For example, the robot device 100A may estimate a target object included in the image IM2 and generate a question using a technique related to object recognition. The robot device 100A may generate a question according to the estimated target object. For example, the robot device 100A may generate a question based on the information stored in the question information storage unit 144 (see FIG. 13). In the example of FIG. 12, the robot apparatus 100A estimates that the target object included in the image IM2 is ice cream and generates the question QS2 “What if I touch it?”.
  • the robot device 100A may generate the question based on the question candidate information in which the name of the object stored in the question information storage unit 144 and the question candidate are associated with each other. For example, in the robot device 100A, the name of the object “ice cream” stored in the question information storage unit 144 and the questions such as “What if I touch it?”, “Is the price expensive?” The question may be generated based on the associated question candidate information. For example, the robot device 100A may determine the question to be output based on a predetermined criterion among the question candidates associated with the name “ice cream” of the object stored in the question information storage unit 144. . For example, the robot device 100A may determine a question candidate selected at random among the question candidates as a question to be output.
  • the robot device 100A may count the number of times the question candidate is output, and determine the question candidate with a small output number as the question to be output. Note that the above is an example, and the robot apparatus 100A may ask the user U1 any technique as long as it can ask the user U1.
  • the robot device 100A outputs the generated question (step S34).
  • the robot apparatus 100A outputs the question QS2, "What if I touch it?"
  • the user U1 provides the robot apparatus 100A with the correct answer to the question QS2 (step S35).
  • the user U1 inputs the correct answer to the question output by the robot apparatus 100A into the robot apparatus 100A.
  • the user U1 inputs the correct answer AS2 “cold” to the robot apparatus 100A for the question QS2 “What if I touch it?” Output by the robot apparatus 100A.
  • the robot apparatus 100A registers, as support information, a combination of the image, the question related to the image, and the correct answer corresponding to the question (step S36). For example, the robot apparatus 100A transitions to the support information registration mode, and registers a combination of an image, a question related to the image, and a correct answer corresponding to the question as support information.
  • the robot apparatus 100A uses the combination of the image IM2, the question QS2, and the correct answer AS2 as shown in the additional registration information RINF21 as the support information (support information SP2) identified by the support information ID “SP2”. register. In this way, the robot apparatus 100A registers the three elements of the input image, the input question, and the correct answer as one set. For example, the robot device 100A stores the support information SP2 in the support information storage unit 141 (see FIG. 13).
  • the robot apparatus 100A itself outputs the question QS2 regarding the image IM2, and registers the correct answer AS2 provided by the user U1 as the correct answer corresponding to the image IM2 and the question QS2. In this way, the robot apparatus 100A asks a question to the user and asks for a response. As a result, the robot apparatus 100A can spontaneously acquire a new concept by outputting a question by itself without waiting for an input from the user. Therefore, the robot device 100A can enable an appropriate response to the question regarding the image.
  • FIG. 13 is a diagram illustrating a configuration example of a robot device 100A according to the second embodiment of the present disclosure.
  • the robot device 100A includes a communication unit 11, an input unit 12, an output unit 13, a storage unit 14A, a control unit 15A, a sensor unit 16A, and a drive unit 17.
  • the input unit 12 receives the input of the correct answer corresponding to the question from the user.
  • the output unit 13 outputs the question.
  • the output unit 13 outputs a question when the user is detected by the sensor unit 16A.
  • the output unit 13 outputs the question generated by the generation unit 153A.
  • the storage unit 14A is realized by, for example, a semiconductor memory device such as a RAM or a flash memory, or a storage device such as a hard disk or an optical disk.
  • the storage unit 14A includes a support information storage unit 141, a model information storage unit 142, a mode information storage unit 143A, and a question information storage unit 144.
  • the mode information storage unit 143A stores information regarding the mode of the robot device 100A.
  • FIG. 14 is a diagram illustrating an example of the mode information storage unit according to the second embodiment of the present disclosure.
  • FIG. 14 shows an example of the mode information storage unit 143A according to the second embodiment.
  • the mode information storage unit 143A has items such as “mode ID”, “mode”, and “flag”.
  • the mode identified by the mode ID “MO21” is the inquiry mode.
  • the mode MO21 indicates that the robot apparatus 100A itself outputs a question.
  • the question information storage unit 144 stores various information regarding the question.
  • the question information storage unit 144 stores information used by the robot apparatus 100A itself to output a question.
  • the question information storage unit 144 stores question candidate information in which the name of the object and the question candidate are associated with each other. For example, the question information storage unit 144 associates the question candidate information with the name of the object “ice cream” and the questions such as “What if I touch it?”, “Is the price expensive?”, And “What is the taste?”.
  • the control unit 15A is realized by, for example, a CPU, an MPU, or the like executing a program (for example, an information processing program according to the present disclosure) stored inside the robot apparatus 100A using a RAM or the like as a work area.
  • the control unit 15A is a controller, and may be realized by an integrated circuit such as an ASIC or FPGA.
  • control unit 15A includes an acquisition unit 151, a determination unit 152, a generation unit 153A, a registration unit 154, a learning unit 155, and a determination unit 156, and information described below. Realize or execute processing functions and actions.
  • the internal configuration of the control unit 15A is not limited to the configuration shown in FIG. 13, and may be any other configuration as long as it is a configuration for performing information processing described later.
  • the acquisition unit 151 acquires the question output by the output unit 13.
  • the determination unit 152 recognizes (determines) whether or not a user who is a partner of the question and answer is in the vicinity.
  • the generation unit 153A generates a question. For example, the generation unit 153A generates a question related to the image based on the acquired image.
  • the generation unit 153A generates a question by estimating the target object included in the image by using a technique of a technique related to object recognition.
  • the generation unit 153A generates a question based on the information stored in the question information storage unit 144. In the example of FIG. 12, the generation unit 153A estimates that the target object included in the image IM2 is ice cream, and generates the question QS2 “What if I touch it?”.
  • the registration unit 154 registers the combination including the correct answer input by the user as support information.
  • the sensor unit 16A has a function as a detection unit that detects the presence or absence of a user.
  • the sensor unit 16A detects the presence or absence of a user. For example, the sensor unit 16A detects whether or not there is a user around by using the camera.
  • the sensor unit 16A detects the presence or absence of the user based on the image captured by the camera.
  • FIG. 15 is a flowchart showing the procedure of a correct answer registration process by a dialog with a user according to the second embodiment of the present disclosure.
  • the robot apparatus 100 detects a person (step S301). For example, the robot apparatus 100 detects the presence of the user by appropriately using various sensors such as a camera. For example, the robot apparatus 100 analyzes the image captured by the camera, and determines whether the user exists based on the analysis result. When the robot device 100 does not detect a person (step S301; No), the process returns to step S301 and repeats the processing.
  • step S301 when the robot device 100 detects a person (step S301; Yes), it outputs a question (step S302).
  • the robot apparatus 100 has already acquired an image (input image) to be a query before step S302.
  • the robot apparatus 100 captures and acquires an input image with a camera.
  • the robot apparatus 100 generates and outputs a question based on the input image that is input.
  • the robot apparatus 100 generates and outputs a question based on the question information stored in the question information storage unit 144 (see FIG. 13).
  • the robot apparatus 100 determines whether or not there is a user reaction (step S303). For example, the robot apparatus 100 determines whether there is a reaction of the user based on whether or not the input by the user is accepted. For example, the robot apparatus 100 determines that there is a reaction of the user when the utterance by the user is detected by the microphone. When it is not determined that the user has reacted (step S303; No), the robot apparatus 100 returns to step S303 and repeats the processing.
  • the robot apparatus 100 registers the input response as the correct answer (step S304).
  • the robot apparatus 100 registers support information that is a combination of an input image, an output question, and a correct answer input by the user.
  • FIG. 16 is a diagram showing a configuration example of an information processing system according to a modification of the present disclosure.
  • FIG. 17 is a diagram illustrating a configuration example of the information processing device according to the modification of the present disclosure.
  • the information processing system 1 includes a robot device 10 and an information processing device 100B.
  • the robot device 10 and the information processing device 100B are communicably connected to each other via a network N in a wired or wireless manner.
  • the information processing system 1 illustrated in FIG. 16 may include a plurality of robot devices 10 and a plurality of information processing devices 100B.
  • the information processing apparatus 100B may communicate with the robot apparatus 10 via the network N and may instruct the model learning or the robot apparatus 10 response based on the information collected by the robot apparatus 10. ..
  • the robot device 10 detects an image with a camera and a user's utterance (question) with a microphone. Then, the robot apparatus 10 outputs the detected image and the response corresponding to the user's question by the speaker.
  • the robot device 10 may be any device as long as it can send and receive information to and from the information processing device 100B. For example, a human (user) such as an entertainment robot or a household robot is used. ) May interact with the robot. For example, the robot device 10 transmits the captured image and information collected by a dialogue with the user to the information processing device 100B.
  • the information processing apparatus 100B is an information processing apparatus that registers an image, a question related to the image, and a correct answer corresponding to the question as support information.
  • the information processing apparatus 100 ⁇ / b> B may transmit information to the robot apparatus 10 and remotely control the robot apparatus 10 to realize the dialogue with the user by the robot apparatus 10. Then, the information processing apparatus 100B acquires various information by receiving the various information acquired by the robot apparatus 10 from the robot apparatus 10. In this way, the information processing apparatus 100B registers the support information based on the information collected through the dialogue with the user of the robot apparatus 10.
  • the information processing device 100B has a communication unit 11B, a storage unit 14B, and a control unit 15.
  • the communication unit 11B is connected to a network N (Internet or the like) by wire or wirelessly, and transmits / receives information to / from the robot apparatus 10 via the network N.
  • the storage unit 14B includes a support information storage unit 141 and a model information storage unit 142.
  • the information processing apparatus 100B does not have a sensor unit, a driving unit, or the like, and may not have a configuration for realizing the function as the robot device.
  • the information processing apparatus 100B includes an input unit (for example, a keyboard and a mouse) that receives various operations from an administrator who manages the information processing apparatus 100B, and a display unit (for example, a liquid crystal display) for displaying various information. ) May be included.
  • an input unit for example, a keyboard and a mouse
  • a display unit for example, a liquid crystal display
  • the robot devices 100 and 100A and the information processing device 100B may adjust the camera viewpoint in order to make the question and answer more efficient. This point will be described with reference to FIGS. 18 to 20. Although the robot device 100 will be described below as an example and the case where the robot device 100 adjusts the camera viewpoint will be described as an example, any device may be used as long as the camera viewpoint can be adjusted.
  • the information processing apparatus 100B as a wearable terminal worn by the user may adjust the camera viewpoint described later.
  • the information processing apparatus 100B may output a voice designating the moving direction or the direction to the user by using an output unit such as a speaker.
  • the wearable terminal worn by the user and the information processing apparatus 100B may be separate bodies.
  • the information processing apparatus 100B may receive information from the wearable terminal worn by the user, and may instruct the wearable terminal to adjust the camera viewpoint based on the received information.
  • the information processing apparatus 100B may transmit voice information designating a moving direction and a direction to the user to the wearable terminal and cause the wearable terminal to output the voice.
  • the information processing system 1 may include the information processing device 100B and the wearable terminal, and may not include the robot device 10.
  • the information processing apparatus 100B may instruct the user to output a speaker, a display, or the like to cause the user to take a camera position (camera viewpoint) for more efficient question and answer.
  • FIG. 18 is a diagram illustrating an example of a camera viewpoint adjustment process according to the present disclosure.
  • the robot apparatus 100 estimates the peripheral space from the image captured by the camera, drives the drive unit 17, and sets the camera position more appropriate for communication. take.
  • User U1 inputs a question to robot device 100 (step S51).
  • the user U1 inputs the question QS51 “How many ice creams are there?” To the robot apparatus 100 by speaking “How many ice creams are there?”.
  • the robot apparatus 100 also acquires an image (step S52).
  • the robot device 100 detects the image IM50.
  • the robot apparatus 100 images the image IM50 including only a part (upper part) of the two ice creams before adjusting the camera viewpoint.
  • the robot apparatus 100 estimates the surrounding space (step S53).
  • the robot apparatus 100 estimates the peripheral space in the range included in the image IM50 using various techniques. For example, the robot apparatus 100 estimates the surrounding space in the vertical and horizontal directions of the range included in the image IM50. For example, the robot apparatus 100 estimates the surrounding space by appropriately using various conventional techniques.
  • the robot apparatus 100 uses various conventional techniques as appropriate to generate an image (hereinafter also referred to as an “estimation image”) corresponding to the space in order to estimate what is in the space above, below, left, and right.
  • the robot apparatus 100 uses the model for restoring the image (restoring model) to generate the estimation image.
  • the robot apparatus 100 generates an estimation image using a restoration model (network) that is learned to restore the entire object from an image showing a part of the object.
  • the robot apparatus 100 acquires the restoration model from the external device that has generated the restoration model.
  • the robot device 100 may learn a restoration model (network) that intentionally cuts out a part of an image in which an object is clearly captured and restores the cut out part.
  • the robot apparatus 100 specifies a range in which an object is captured in the image IM50 by appropriately using various conventional techniques such as a technique related to object recognition.
  • the robot apparatus 100 specifies the range (lower range) in which the object is seen in the lower direction of the image IM50, and cuts out the lower range (lower image) of the image IM50.
  • the robot apparatus 100 uses the lower image of the image IM50 and the restoration model to generate a lower restored image ES51 that is an estimation image.
  • the robot apparatus 100 similarly processes the upward, rightward, and leftward directions of the image IM50 to generate an upper restored image, a right restored image, and a left restored image.
  • the robot apparatus 100 may generate the estimation image by any method as long as the estimation image can be generated.
  • the robot device 100 may generate the estimation images of the spaces above, below, to the left and to the right of the range included in the image IM50 based on the method disclosed in Non-Patent Document 3 described above.
  • the robot device 100 may generate an estimation image of the space above, below, to the left, and to the right of the range included in the image IM50 by using a technology related to a hostile generation network (GAN: Generative Adversarial Networks).
  • GAN Generative Adversarial Networks
  • the robot device 100 may generate estimation images of the spaces above, below, to the left, and right of the range included in the image IM50 based on the method disclosed in Non-Patent Document 4 described above. For example, the robot device 100 may generate an image for estimation of the space above, below, to the left, and to the right of the range included in the image IM50 by using a technology such as PixelRNN (Recurrent Neural Network).
  • PixelRNN Recurrent Neural Network
  • the robot apparatus 100 uses the lower restored image ES51, the upper restored image, the right restored image, and the left restored image to estimate the surrounding space in the left, right, up, and down directions.
  • the robot apparatus 100 estimates the surrounding space using the information of the question QS51.
  • the robot apparatus 100 projects the estimation image and the given question in the common space, and sets the direction having the highest confidence in the answer identification result as the target of the viewpoint policy. In this way, the robot apparatus 100 can adjust the camera viewpoint in the direction in consideration of the suitability for answering the question by adding the question information.
  • the robot apparatus 100 projects the lower restored image ES51, the upper restored image, the right restored image, the left restored image, and the question QS51 into a common space, and indicates the confidence (reliability) of the identification result of the answer (hereinafter The direction corresponding to the estimation image having the largest "score") is determined as the direction in which the camera viewpoint is directed.
  • the robot apparatus 100 uses the identification model to compare the integrated feature amount obtained by integrating the image feature amount of the lower restored image ES51 and the question feature amount of the question QS51 with the integrated feature amount of each support information. You may calculate the score based on it.
  • the robot apparatus 100 may calculate a score based on the distance between the combination of the image feature amount of the lower restored image ES51 and the question QS51 and the support information having the shortest distance. For example, the score may have a larger value as the distance is shorter, and may be a value output by the identification model.
  • the robot apparatus 100 may set the direction in which the estimated object identification confidence in the image is highest as the target of the viewpoint policy. For example, the robot apparatus 100 may use only the lower restored image ES51, the upper restored image, the right restored image, and the left restored image to set the direction in which the score of the object identification of the image IM50 is the highest as the target of the viewpoint policy. ..
  • the robot apparatus 100 determines the viewpoint policy (step S54).
  • the robot apparatus 100 determines to adjust the camera viewpoint in the downward direction corresponding to the lower restored image ES51.
  • the robot apparatus 100 generates the viewpoint adjustment information IS51, which is an instruction to change the camera direction to "down camera” to "down".
  • the robot apparatus 100 performs an operation of adjusting the camera viewpoint (step S55).
  • the robot apparatus 100 drives an actuator or the like so that the camera faces downward based on the viewpoint adjustment information IS51 that instructs the camera to face downward.
  • the robot apparatus 100 drives an actuator or the like so that the head faces downward.
  • the robot apparatus 100 captures the image IM51 including the two ice creams.
  • the robot apparatus 100 outputs the correct answer of the determined support information based on the image IM51 and the question QS51 (step S56).
  • the correct answer AS 51 “two” is output to the robot apparatus 100.
  • the robot apparatus 100 realizes the adjustment of the camera viewpoint by inputting an image, estimating the surrounding space, determining the viewpoint policy, and performing an operation based on the determination.
  • the robot apparatus 100 can acquire the appropriate image by adjusting the camera viewpoint according to the input image, even if the appropriate image cannot be acquired. .. Thereby, the robot apparatus 100 can make an appropriate response to the question about the image.
  • FIG. 19 is a block diagram showing an example of the configuration from the input to the output related to the adjustment of the camera viewpoint of the present disclosure.
  • the robot apparatus 100 extracts the feature amount (image feature amount) of the image detected by the camera from the query. Further, the robot apparatus 100 extracts the feature amount (question feature amount) of the voice of the question input by the query microphone. Then, the robot apparatus 100 estimates the peripheral space based on the extracted image feature amount. Thereby, the robot apparatus 100 generates an estimation image based on the extracted image feature amount. For example, the robot apparatus 100 estimates the peripheral space in the downward direction based on the extracted image feature amount, and generates the estimation image in the downward direction. Then, the robot apparatus 100 projects the feature amount and the question feature amount regarding the estimation image in the common space.
  • the robot device 100 generates an episode.
  • the robot apparatus 100 generates an episode by adding the support information.
  • the robot apparatus 100 uses the support information stored in the support information storage unit 141 to generate an episode.
  • the robot apparatus 100 examines the identification confidence based on the episode. For example, the robot apparatus 100 determines the direction having the largest score among the examined directions as a candidate for the moving direction. Then, if there is an unexamined direction, the robot apparatus 100 estimates the surrounding space for the unexamined direction, and repeats the examination of the identification confidence until the unexamined direction disappears. For example, the robot apparatus 100 estimates the peripheral space in the upward direction, the rightward direction, and the leftward direction in the same manner as the estimated downward direction, and repeats the examination of the identification confidence.
  • the robot apparatus 100 determines the camera movement policy based on the direction having the highest score among all the examined directions. For example, the robot apparatus 100 determines a moving direction candidate direction as a moving direction after examining all directions. The robot apparatus 100 determines, as the moving direction, the direction having the highest score among all the examined directions.
  • the robot apparatus 100 drives the actuator to adjust the camera viewpoint in the determined moving direction.
  • FIG. 20 is a flowchart showing the procedure of the camera viewpoint adjustment process of the present disclosure.
  • the robot device 100 receives an input (step S501).
  • the robot device 100 receives an input of an image or a question.
  • the robot apparatus 100 estimates the peripheral space in the specific direction (step S502). For example, the robot apparatus 100 estimates a peripheral space in an unexamined direction among the directions to be estimated. Then, the robot apparatus 100 examines the identification confidence in the specific direction (step S503). For example, the robot apparatus 100 considers whether the score in the specific direction is the maximum, and if the score in the specific direction is the maximum among the examined directions, the specific direction is determined as a candidate for the moving direction.
  • the robot apparatus 100 determines whether or not all directions have been considered (step S504). If all directions have not been considered (step S504; No), the robot apparatus 100 returns to step S502 and repeats the process until there are no unexamined directions.
  • step S504 when the robot device 100 considers all the directions (step S504; Yes), the robot device 100 performs the process of step S505.
  • the robot apparatus 100 determines a moving direction candidate as a moving direction after examining all the directions. For example, the robot apparatus 100 may determine not to adjust the camera viewpoint when the scores in all directions are less than the predetermined threshold.
  • the robot apparatus 100 determines whether it operates (step S505). When it is determined that the robot device 100 does not operate (step S505; No), the robot device 100 performs identification without changing the camera viewpoint (step S508). For example, the robot apparatus 100 performs the identification without changing the camera viewpoint when the scores in all directions are less than the predetermined threshold value.
  • the robot device 100 determines an operation policy (step S506). For example, the robot apparatus 100 determines a moving direction candidate as the moving direction. The robot apparatus 100 determines to point the camera viewpoint in the direction having the maximum score.
  • the robot apparatus 100 operates according to the determination (step S507).
  • the robot apparatus 100 drives the actuator and adjusts the camera viewpoint in the determined moving direction.
  • the robot apparatus 100 performs identification (step S508). For example, the robot apparatus 100 determines the support information used for the response based on the query including the image and the question and the support information.
  • the robot apparatus 100 outputs based on the identification result (step S509). For example, the robot apparatus 100 outputs the correct answer of the determined support information based on the query including the image and the question.
  • each component of each device shown in the drawings is functionally conceptual, and does not necessarily have to be physically configured as shown. That is, the specific form of distribution / integration of each device is not limited to that shown in the drawings, and all or part of the device may be functionally or physically distributed / arranged in arbitrary units according to various loads or usage conditions. It can be integrated and configured.
  • FIG. 21 is a hardware configuration diagram showing an example of a computer 1000 that realizes the functions of an information processing apparatus such as the robot apparatuses 100 and 100A and the information processing apparatus 100B.
  • the computer 1000 has a CPU 1100, a RAM 1200, a ROM (Read Only Memory) 1300, an HDD (Hard Disk Drive) 1400, a communication interface 1500, and an input / output interface 1600.
  • Each unit of the computer 1000 is connected by a bus 1050.
  • the CPU 1100 operates based on a program stored in the ROM 1300 or the HDD 1400, and controls each part. For example, the CPU 1100 expands a program stored in the ROM 1300 or the HDD 1400 into the RAM 1200 and executes processing corresponding to various programs.
  • the ROM 1300 stores a boot program such as a BIOS (Basic Input Output System) executed by the CPU 1100 when the computer 1000 starts up, a program dependent on the hardware of the computer 1000, and the like.
  • BIOS Basic Input Output System
  • the HDD 1400 is a computer-readable recording medium that non-temporarily records a program executed by the CPU 1100, data used by the program, and the like. Specifically, the HDD 1400 is a recording medium that records the information processing program according to the present disclosure, which is an example of the program data 1450.
  • the communication interface 1500 is an interface for connecting the computer 1000 to an external network 1550 (for example, the Internet).
  • the CPU 1100 receives data from another device or transmits the data generated by the CPU 1100 to another device via the communication interface 1500.
  • the input / output interface 1600 is an interface for connecting the input / output device 1650 and the computer 1000.
  • the CPU 1100 receives data from an input device such as a keyboard or a mouse via the input / output interface 1600.
  • the CPU 1100 also transmits data to an output device such as a display, a speaker, a printer, etc. via the input / output interface 1600.
  • the input / output interface 1600 may function as a media interface for reading a program or the like recorded in a predetermined recording medium (medium).
  • Examples of media include optical recording media such as DVD (Digital Versatile Disc) and PD (Phase change rewritable Disk), magneto-optical recording media such as MO (Magneto-Optical disk), tape media, magnetic recording media, and semiconductor memory.
  • optical recording media such as DVD (Digital Versatile Disc) and PD (Phase change rewritable Disk)
  • magneto-optical recording media such as MO (Magneto-Optical disk)
  • tape media magnetic recording media
  • semiconductor memory semiconductor memory.
  • the present technology may also be configured as below.
  • the combination of the image, the question, and the correct answer acquired by the acquisition unit is registered as support information used to determine a response to query information including one image and one question related to the one image.
  • Registration department An information processing apparatus including.
  • the acquisition unit is Obtaining the image, the question about the concept corresponding to the image, and the correct answer about the concept,
  • the registration unit is The information processing apparatus according to (1), wherein the combination of the image, the question regarding the concept, and the correct answer regarding the concept is registered as the support information.
  • the acquisition unit is Acquiring the image, the question about the object contained in the image, and the correct answer about the object
  • the registration unit is The information processing apparatus according to (2), wherein the combination of the image, the question regarding the object, and the correct answer regarding the object is registered as the support information.
  • the acquisition unit is Acquiring the image, the question regarding the property or state of an object included in the image, and the correct answer regarding the property or state
  • the registration unit is The information processing apparatus according to (3), wherein the combination of the image, the question regarding the property or the state, and the correct answer regarding the property or the state is registered as the support information.
  • the acquisition unit is Obtaining the image, the question regarding the user's impression of the image, and the correct answer regarding the impression
  • the registration unit is The information processing apparatus according to (4), wherein the combination of the image, the question regarding the impression, and the correct answer regarding the impression is registered as the support information.
  • the acquisition unit is Acquiring the image, the amount of the object contained in the image, the color, the question about the temperature or hardness, and the amount, the color, the temperature or the correct answer about the hardness
  • the registration unit is The image and the question regarding the amount, the color, the temperature, or the hardness, and the question and the combination of the correct answers regarding the amount, the color, the temperature, or the hardness are registered as the support information.
  • the information processing apparatus according to (5) above.
  • An input unit that accepts user input, Equipped with The acquisition unit is The information processing apparatus according to any one of (1) to (6), which acquires the question input by the user.
  • An output unit that outputs a correct answer candidate corresponding to the question, Equipped with The input unit is Accepting a response to the correct candidate by the user,
  • the registration unit is The information processing device according to (7), wherein the combination including the correct answer determined according to the reaction to the correct answer candidate is registered as the support information.
  • the registration unit is The information processing device according to (8), wherein when the reaction to the correct answer candidate is affirmative, the combination including the correct answer candidate as the correct answer is registered as the support information.
  • the input unit is When the reaction to the correct answer candidate is negative, a correct answer candidate different from the correct answer candidate by the user is accepted,
  • the registration unit is The information processing apparatus according to (8), wherein the combination including the another correct answer candidate input by the user as the correct answer is registered as the support information.
  • An output unit that outputs the question, Equipped with The acquisition unit is The information processing apparatus according to any one of (1) to (6), wherein the question output by the output unit is acquired.
  • An input unit that receives an input of the correct answer corresponding to the question by the user, Equipped with The registration unit is The information processing apparatus according to (11), wherein the combination including the correct answer input by the user is registered as the support information.
  • a detection unit that detects the presence or absence of a user, Equipped with The output unit is The information processing device according to (11) or (12), which outputs the question when the user is detected by the detection unit.
  • An image capturing unit that captures the image, Equipped with The acquisition unit is The information processing apparatus according to any one of (1) to (13), which acquires the image detected by the imaging unit.
  • a determination unit that determines one correct answer corresponding to the one question of the query information, based on the query information and the support information, The information processing apparatus according to any one of (1) to (14) above, including: (16) The determination unit is The information processing device according to (15), wherein the one correct answer is determined based on the one image and the one question included in the query information, and the image and the question included in the support information. .. (17) The determination unit is The information according to (16), wherein the one correct answer is determined based on a comparison between the one image and the one question included in the query information and the image and the question included in the support information. Processing equipment.
  • the acquisition unit is Obtaining the query information
  • the determination unit is Based on the query information acquired by the acquisition unit and a plurality of support information, one of the plurality of support information to be used for the one correct answer is determined (15) to (17)
  • the information processing device according to any one of 1. (19) Obtaining an image, a question associated with the image, and a correct answer corresponding to the question, Registering the acquired combination of the image, the question, and the correct answer as support information used for determining the response to the query information including the one image and the one question related to the one image. Information for executing a process. Processing method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

An information processing device according to the present disclosure comprises: an acquisition unit (151) for acquiring an image, a question pertaining to the image, and a correct answer corresponding to the question; and a recording unit (154) for recording a combination of the image, question, and correct answer acquired by the acquisition unit (151) as support information used to determine a response to query information including one image and one question pertaining to the one image.

Description

情報処理装置、情報処理方法及び情報処理プログラムInformation processing apparatus, information processing method, and information processing program
 本開示は、情報処理装置、情報処理方法及び情報処理プログラムに関する。 The present disclosure relates to an information processing device, an information processing method, and an information processing program.
 様々な技術分野において、機械学習を利用した情報処理が活用されており、深層学習の発達を通じて、家庭用エージェントやロボットなどが多種類の物体を学習し、識別できるようになっている。例えば、画像に関する質問に応答するシステムが実現されてきている。 Information processing using machine learning is used in various technical fields, and with the development of deep learning, household agents and robots are able to learn and identify many types of objects. For example, systems have been implemented that answer questions about images.
 従来技術によれば、画像及び質問に関する情報を用いて、画像に関する質問に対して応答を行う。 According to the conventional technology, the question about the image is answered using the information about the image and the question.
 しかしながら、従来技術は、画像に関する質問に対して適切な応答を行うことができるとは限らない。例えば、従来技術では、識別可能な答え(クラス)の数が限られており、その答え以外については識別不可能になる。このように、従来技術では、予め用意された正解の範囲内でなければ、画像に関する質問に対して適切な応答を行うことが困難であった。 However, the conventional technology is not always able to make an appropriate response to a question about an image. For example, in the prior art, the number of identifiable answers (classes) is limited, and other than that answer cannot be identified. As described above, in the related art, it is difficult to appropriately make a response to a question regarding an image unless it is within the range of a correct answer prepared in advance.
 そこで、本開示では、画像に関する質問に対して適切な応答を可能にすることのできる情報処理装置、情報処理方法及び情報処理プログラムを提案する。 Therefore, the present disclosure proposes an information processing device, an information processing method, and an information processing program capable of appropriately responding to a question regarding an image.
 上記の課題を解決するために、本開示に係る一形態の情報処理装置は、画像、前記画像に関連する質問、及び前記質問に対応する正解を取得する取得部と、前記取得部により取得された前記画像、前記質問、及び前記正解の組合せを、一の画像及び前記一の画像に関連する一の質問を含むクエリ情報への応答の決定に用いるサポート情報として登録する登録部と、を備える。 In order to solve the above problems, an information processing device according to an aspect of the present disclosure is an image, a question related to the image, and an acquisition unit that acquires a correct answer corresponding to the question, and is acquired by the acquisition unit. And a registration unit that registers the combination of the image, the question, and the correct answer as support information used to determine a response to query information including one image and one question related to the one image. ..
本開示の第1の実施形態に係る情報処理の一例を示す図である。It is a figure showing an example of information processing concerning a 1st embodiment of this indication. 本開示の第1の実施形態に係る情報処理の他の一例を示す図である。It is a figure which shows another example of the information processing which concerns on the 1st Embodiment of this indication. 本開示の第1の実施形態に係るロボット装置の構成例を示す図である。It is a figure which shows the structural example of the robot apparatus which concerns on the 1st Embodiment of this indication. 本開示の第1の実施形態に係るサポート情報記憶部の一例を示す図である。FIG. 3 is a diagram illustrating an example of a support information storage unit according to the first embodiment of the present disclosure. 本開示の第1の実施形態に係るモデル情報記憶部の一例を示す図である。It is a figure showing an example of a model information storage part concerning a 1st embodiment of this indication. 本開示の第1の実施形態に係るモード情報記憶部の一例を示す図である。It is a figure showing an example of a mode information storage part concerning a 1st embodiment of this indication. 本開示に係る識別の一例を示す図である。It is a figure which shows an example of the identification which concerns on this indication. 本開示に係る学習用のネットワーク構造の一例を示す図である。It is a figure which shows an example of the network structure for learning which concerns on this indication. 本開示に係る入力から出力までの構成の一例を示すブロック図である。FIG. 20 is a block diagram showing an example of a configuration from input to output according to the present disclosure. 本開示の第1の実施形態に係る情報処理の手順を示すフローチャートである。3 is a flowchart showing a procedure of information processing according to the first embodiment of the present disclosure. 本開示の第1の実施形態に係るユーザとの対話による正解の登録処理の手順を示すフローチャートである。5 is a flowchart showing a procedure of correct answer registration processing by a dialog with a user according to the first embodiment of the present disclosure. 本開示の第2の実施形態に係る情報処理の一例を示す図である。It is a figure showing an example of information processing concerning a 2nd embodiment of this indication. 本開示の第2の実施形態に係るロボット装置の構成例を示す図である。It is a figure which shows the structural example of the robot apparatus which concerns on the 2nd Embodiment of this indication. 本開示の第2の実施形態に係るモード情報記憶部の一例を示す図である。It is a figure showing an example of a mode information storage part concerning a 2nd embodiment of this indication. 本開示の第2の実施形態に係るユーザとの対話による正解の登録処理の手順を示すフローチャートである。9 is a flowchart illustrating a procedure of correct answer registration processing by a dialog with a user according to the second embodiment of the present disclosure. 本開示の変形例に係る情報処理システムの構成例を示す図である。It is a figure which shows the structural example of the information processing system which concerns on the modification of this indication. 本開示の変形例に係る情報処理装置の構成例を示す図である。It is a figure showing an example of composition of an information processor concerning a modification of this indication. 本開示のカメラ視点の調整処理の一例を示す図である。It is a figure which shows an example of the adjustment process of the camera viewpoint of this indication. 本開示のカメラ視点の調整に係る入力から出力までの構成の一例を示すブロック図である。FIG. 16 is a block diagram showing an example of a configuration from input to output relating to adjustment of a camera viewpoint of the present disclosure. 本開示のカメラ視点の調整処理の手順を示すフローチャートである。7 is a flowchart illustrating a procedure of a camera viewpoint adjustment process of the present disclosure. ロボット装置や情報処理装置の機能を実現するコンピュータの一例を示すハードウェア構成図である。It is a hardware block diagram which shows an example of a computer which implement | achieves the function of a robot apparatus and an information processing apparatus.
 以下に、本開示の実施形態について図面に基づいて詳細に説明する。なお、この実施形態により本願にかかる情報処理装置、情報処理方法及び情報処理プログラムが限定されるものではない。また、以下の各実施形態において、同一の部位には同一の符号を付することにより重複する説明を省略する。 Hereinafter, embodiments of the present disclosure will be described in detail with reference to the drawings. The information processing apparatus, the information processing method, and the information processing program according to the present application are not limited to this embodiment. Further, in each of the following embodiments, the same parts are designated by the same reference numerals, and a duplicate description will be omitted.
 以下に示す項目順序に従って本開示を説明する。
  1.第1の実施形態
   1-1.本開示の第1の実施形態に係る情報処理の概要
   1-2.第1の実施形態に係るロボット装置の構成
   1-3.第1の実施形態に係る情報処理の手順
  2.第2の実施形態
   2-1.本開示の第2の実施形態に係る情報処理の概要
   2-2.第2の実施形態に係るロボット装置の構成
   2-3.第2の実施形態に係る情報処理の手順
  3.その他の実施形態
   3-1.その他の構成例
   3-2.カメラ視点の調整
  4.ハードウェア構成
The present disclosure will be described in the following item order.
1. First Embodiment 1-1. Outline of information processing according to first embodiment of the present disclosure 1-2. Configuration of robot device according to first embodiment 1-3. Information processing procedure according to the first embodiment 2. Second embodiment 2-1. Overview of information processing according to second embodiment of the present disclosure 2-2. Configuration of robot apparatus according to second embodiment 2-3. Information processing procedure according to the second embodiment 3. Other Embodiments 3-1. Other configuration examples 3-2. Adjustment of camera viewpoint 4. Hardware configuration
(1.第1の実施形態)
[1-1.本開示の第1の実施形態に係る情報処理の概要]
 図1は、本開示の第1の実施形態に係る情報処理の一例を示す図である。本開示の第1の実施形態に係る情報処理は、図1に示すロボット装置100によって実現される。
(1. First embodiment)
[1-1. Overview of information processing according to the first embodiment of the present disclosure]
FIG. 1 is a diagram illustrating an example of information processing according to the first embodiment of the present disclosure. The information processing according to the first embodiment of the present disclosure is realized by the robot device 100 shown in FIG.
 ロボット装置100は、第1の実施形態に係る情報処理を実行する情報処理装置である。ロボット装置100は、ユーザとの対話に応じて、画像、画像に関連する質問、及び質問に対応する正解の組合せを登録する情報処理装置である。第1の実施形態において、ロボット装置100は、カメラ(図3のセンサ部16に対応)により画像を検知し、マイク(図3の入力部12に対応)によりユーザの発話(質問)を検知する。そして、ロボット装置100は、検知した画像及びユーザの質問に対応する応答をスピーカー(図3の出力部13に対応)による出力する。ロボット装置100は、第1の実施形態における処理を実現可能であれば、どのような装置であってもよく、例えば、エンタテインメントロボットや家庭用ロボットと称されるような、人間(ユーザ)と対話するロボットであってもよい。 The robot device 100 is an information processing device that executes information processing according to the first embodiment. The robot apparatus 100 is an information processing apparatus that registers an image, a question associated with the image, and a combination of correct answers corresponding to the question in response to a dialog with a user. In the first embodiment, the robot apparatus 100 detects an image with a camera (corresponding to the sensor unit 16 in FIG. 3) and a user's utterance (question) with a microphone (corresponding to the input unit 12 in FIG. 3). .. Then, the robot apparatus 100 outputs the detected image and the response corresponding to the user's question by the speaker (corresponding to the output unit 13 in FIG. 3). The robot device 100 may be any device as long as it can realize the processing in the first embodiment. For example, the robot device 100 interacts with a human (user) such as an entertainment robot or a household robot. It may be a robot that does.
 図1を用いて、エージェントであるロボット装置100がユーザU1との対話を通じて、画像、質問及び正解の組合せ(以下「サポート情報」ともいう)を登録する場合を示す。まず、ロボット装置100は、画像を取得する(ステップS11)。図1の例では、ロボット装置100は、カメラにより2つのアイスクリーム(単に「アイス」ともいう)を撮像することにより、画像IM1を検知する。例えば、ロボット装置100には、ロボット装置100に付いているカメラを通じて画像が入力される。カメラを通じて取得される画像(視覚情報)は種々の態様であってもよいが、例えば、カメラは、RGB情報を画像情報として検知してもよい。なお、ロボット装置100は、外部装置が検知した画像IM1を外部装置から取得してもよい。 FIG. 1 shows a case where the robot device 100 as an agent registers a combination of an image, a question, and a correct answer (hereinafter also referred to as “support information”) through a dialogue with the user U1. First, the robot apparatus 100 acquires an image (step S11). In the example of FIG. 1, the robot apparatus 100 detects the image IM1 by capturing images of two ice creams (also simply referred to as “ices”) with a camera. For example, an image is input to the robot apparatus 100 through a camera attached to the robot apparatus 100. The image (visual information) acquired through the camera may have various forms, but for example, the camera may detect RGB information as image information. The robot device 100 may acquire the image IM1 detected by the external device from the external device.
 そして、ユーザU1は、ロボット装置100に質問を登録するために、ロボット装置100のモードを質問登録モードに指定する(ステップS12)。ユーザU1は、ロボット装置100に対して、質問登録モードを指定することを示す入力を行う。図1の例では、ユーザU1は、事前にロボット装置100に登録された質問登録モードを指定するコマンドMD1を入力する。具体的には、ユーザU1は、質問を行うことを示す「質問」と発話することにより、ロボット装置100に質問登録モードを指定する入力を行う。なお、コマンドMD1は、「質問」のみに限らず、「注目」、「聞いて」、「ねぇXXX(ロボット名)」等適宜設定されてもよい。ロボット装置100に設けられたマイクが、ユーザの発話を検知することにより、入力を受け付ける。これにより、ロボット装置100は、質問登録モードを指定する入力を受け付ける。 Then, the user U1 designates the mode of the robot apparatus 100 as the question registration mode in order to register the question in the robot apparatus 100 (step S12). The user U1 makes an input to the robot apparatus 100, which indicates that the question registration mode is designated. In the example of FIG. 1, the user U1 inputs the command MD1 that specifies the question registration mode registered in the robot device 100 in advance. Specifically, the user U1 inputs the question registration mode to the robot apparatus 100 by speaking “question” indicating that a question is to be asked. The command MD1 is not limited to “question”, but may be set appropriately such as “attention”, “listen”, “hey XXX (robot name)”. A microphone provided in the robot device 100 receives an input by detecting a user's utterance. As a result, the robot apparatus 100 receives an input designating the question registration mode.
 例えば、ロボット装置100は、各種の音声認識技術を用いて、音声情報をテキスト(文字情報)に変換する。なお、ロボット装置100は、音声認識サービスを提供する音声認識サーバから情報を取得可能であってもよい。この場合、ロボット装置100は、音声情報を音声認識サーバに送信することにより、音声情報が変換された文字情報を音声認識サーバから取得してもよい。なお、図1の例では、ロボット装置100は、音声認識の機能を有しているものとし、ロボット装置100が、種々の従来技術を適宜用いてユーザの発話を認識したり、発話したユーザを推定したりするものとして、適宜説明を省略する。 For example, the robot apparatus 100 converts voice information into text (character information) using various voice recognition techniques. The robot device 100 may be able to acquire information from a voice recognition server that provides a voice recognition service. In this case, the robot device 100 may acquire the character information obtained by converting the voice information from the voice recognition server by transmitting the voice information to the voice recognition server. In the example of FIG. 1, the robot device 100 is assumed to have a voice recognition function, and the robot device 100 recognizes the user's utterance by appropriately using various conventional techniques and identifies the user who uttered the utterance. The description will be appropriately omitted as the estimation.
 例えば、ロボット装置100は、記憶部120(図3参照)にモードIDとそのモードIDに対応するモードへ移行させるための文字情報(キーワード)とを対応付けた対応付情報を記憶してもよい。そして、ロボット装置100は、ユーザの発話に対応する文字列と対応付情報の文字情報とを比較し、対応する文字情報のモードに移行してもよい。なお、モードの指定は、音声に限らず、種々の態様であってもよい。例えば、ロボット装置100自体に設けられた質問登録モードに移行するボタン(ハードウェアまたはソフトウェア上であってもよい)を、ユーザが操作することにより行われてもよい。 For example, the robot apparatus 100 may store association information in which the storage unit 120 (see FIG. 3) associates the mode ID with the character information (keyword) for shifting to the mode corresponding to the mode ID. .. Then, the robot apparatus 100 may compare the character string corresponding to the utterance of the user with the character information of the association information, and shift to the mode of the corresponding character information. Note that the mode designation is not limited to voice, and various modes may be used. For example, the operation may be performed by the user operating a button (which may be on hardware or software) provided in the robot apparatus 100 itself to shift to the question registration mode.
 そして、ロボット装置100は、ユーザの入力に応じて、モードを変更する(ステップS13)。図1の例では、ロボット装置100は、ユーザU1の「質問」というコマンドMD1に対応する質問登録モード(図6参照)にモードを変更する。 Then, the robot apparatus 100 changes the mode according to the user's input (step S13). In the example of FIG. 1, the robot apparatus 100 changes the mode to the question registration mode (see FIG. 6) corresponding to the command “MD1” of the user U1, “question”.
 そして、ユーザU1は、ロボット装置100に質問を入力する(ステップS14)。図1の例では、ユーザU1は、「右は何色?」と発話することにより、ロボット装置100に「右は何色?」という質問QS1を入力する。なお、ロボット装置100は、画像IM1と質問QS1とを取得し、応答可能であれば、どのようなタイミングで画像IM1と質問QS1とを取得してもよい。例えば、ロボット装置100は、質問の入力後に画像を取得してもよい。例えば、ロボット装置100は、ステップS14後にステップS11を行ってもよい。 Then, the user U1 inputs a question to the robot device 100 (step S14). In the example of FIG. 1, the user U1 inputs the question QS1 “What color is the right?” To the robot apparatus 100 by speaking “What color is the right?”. Note that the robot device 100 may acquire the image IM1 and the question QS1, and at any timing, may acquire the image IM1 and the question QS1 as long as it can respond. For example, the robot apparatus 100 may acquire an image after inputting a question. For example, the robot apparatus 100 may perform step S11 after step S14.
 ロボット装置100に設けられたマイクが、ユーザの発話を検知することにより、入力を受け付ける。ロボット装置100は、種々の技術を適宜用いて、質問の登録が完了したかを判定する。例えば、ロボット装置100は、質問登録モードに入ってユーザU1からの音声質問入力が始まったあと、ある閾値以上のインタバルで音声が入力されなければ質問登録モードを終了する。そして、ロボット装置100は、質問登録モードで入力された音声質問を文字情報に変換し、その文字情報に基づいて自然言語処理が行われる。例えば、ロボット装置100は、文字情報を、形態素解析等の自然言語処理技術を適宜用いて解析することにより、質問の意味や内容を判定(推定)してもよい。 A microphone provided in the robot device 100 receives an input by detecting a user's utterance. The robot apparatus 100 uses various techniques as appropriate to determine whether the question registration is completed. For example, the robot apparatus 100 enters the question registration mode, and after the voice question input from the user U1 starts, if no voice is input at an interval equal to or more than a certain threshold, the robot apparatus 100 ends the question registration mode. Then, the robot apparatus 100 converts the voice question input in the question registration mode into character information, and natural language processing is performed based on the character information. For example, the robot apparatus 100 may determine (estimate) the meaning or content of the question by analyzing the character information by appropriately using a natural language processing technique such as morphological analysis.
 上記のような処理により、ロボット装置100は、「右は何色?」という質問QS1を受け付ける。このように、ロボット装置100は、「はい」、「いいえ」または「AかBかCか」等の択一で回答するような質問(クローズドクエスチョン)に限らず、自由形式で答えられる質問(オープンクエスチョン)等の種々の形式の質問を受け付ける。図1に示す画像IM1中の右側のアイスクリームのパッケージの色は、薄い紫色であるものとする。 By the above processing, the robot apparatus 100 accepts the question QS1 "What color is right?". As described above, the robot apparatus 100 is not limited to a question (closed question) that is answered by “yes”, “no”, or “A, B, or C”, and the like. We accept various types of questions such as open questions. The color of the ice cream package on the right side in the image IM1 shown in FIG. 1 is light purple.
 そして、ロボット装置100は、種々の技術を用いて質問QS1に対する応答を行う。ロボット装置100は、入力された画像IM1と質問QS1に基づいて、正解の識別を行う。ロボット装置100は、入力された画像IM1と質問QS1をクエリ情報(以下単に「クエリ」ともいう)として、そのクエリに対応する正解の識別を行う。そして、ロボット装置100は、その結果をスピーカーやモニタなどでユーザに出力する。図1の例では、ロボット装置100は、図8に示すような識別処理により、決定したサポート情報を用いて行なう。この場合、ロボット装置100は、識別処理により決定したサポート情報の正解を応答として用いる。 Then, the robot device 100 responds to the question QS1 using various techniques. The robot apparatus 100 identifies the correct answer based on the input image IM1 and the input question QS1. The robot apparatus 100 uses the input image IM1 and the question QS1 as query information (hereinafter also simply referred to as “query”) to identify the correct answer corresponding to the query. Then, the robot device 100 outputs the result to the user via a speaker, a monitor, or the like. In the example of FIG. 1, the robot apparatus 100 performs the identification processing as shown in FIG. 8 using the determined support information. In this case, the robot apparatus 100 uses the correct answer of the support information determined by the identification process as a response.
 ロボット装置100は、識別処理により決定したサポート情報の正解を正解候補として出力する(ステップS15)。ここで、図1の例では、ロボット装置100には、正解が「薄い紫色」であるサポート情報が登録されていないものとする。そのため、ロボット装置100は、正解である「薄い紫色」とは異なる「赤」を正解候補AA1として出力する。 The robot apparatus 100 outputs the correct answer of the support information determined by the identification processing as a correct answer candidate (step S15). Here, in the example of FIG. 1, it is assumed that the robot device 100 does not have the support information whose correct answer is “light purple” registered. Therefore, the robot apparatus 100 outputs “red”, which is different from the correct answer “light purple”, as the correct answer candidate AA1.
 なお、ロボット装置100は、ユーザU1の質問QS1に対して応答できれば、どのような技術を適宜用いてユーザU1の質問QS1への応答を行ってもよい。例えば、ロボット装置100は、物体認識に関する技術を用いて、物体を認識して応答を行ってもよい。 Note that the robot apparatus 100 may respond to the question QS1 of the user U1 by appropriately using any technique as long as it can respond to the question QS1 of the user U1. For example, the robot apparatus 100 may recognize an object and make a response using a technique related to object recognition.
 そして、ロボット装置100は、応答した正解候補を正解として登録するかを識別(判定)できないため、ロボット装置100に対してユーザU1が正解に関する情報を提供する。ユーザU1は、ロボット装置100が応答した正解候補に対する反応をロボット装置100に入力する(ステップS16)。ユーザU1は、ロボット装置100が出力した正解候補AA1である「赤」が、画像IM1中の右側のアイスクリームのパッケージの色に対応しないため、否定的な反応をロボット装置100に入力する。 Since the robot device 100 cannot identify (determine) whether or not to register the responded correct answer candidate as the correct answer, the user U1 provides the robot device 100 with information regarding the correct answer. The user U1 inputs, into the robot apparatus 100, a response to the correct answer that the robot apparatus 100 has responded to (step S16). The user U1 inputs a negative reaction to the robot device 100 because the correct candidate AA1 “red” output by the robot device 100 does not correspond to the color of the ice cream package on the right side of the image IM1.
 図1の例では、ユーザU1は、事前にロボット装置100に登録された正解候補を否定する否定的コマンドNG1を入力する。具体的には、ユーザU1は、正解候補が正しくないことを示す「違う」と発話することにより、ロボット装置100に正解登録モードを指定する入力を行う。なお、否定的コマンドNG1は、「違う」のみに限らず、「不正解」、「間違い」等適宜設定されてもよい。ロボット装置100に設けられたマイクが、ユーザの発話を検知することにより、入力を受け付ける。これにより、ロボット装置100は、正解登録モードを指定する入力としてユーザU1の否定的な反応を受け付ける。なお、ロボット装置100は、ロボット装置100自体に設けられた正解登録モードに移行するボタンを、ユーザが操作することにより、正解登録モードに移行してもよい。また、ロボット装置100は、ユーザU1の反応が肯定的であるか否定的であるかを判定できなかった場合、ユーザU1に応答を要求してもよい。この場合、ロボット装置100は、例えば「さっきの回答は正解ですか?」等、ユーザU1に肯定か否定かを選択させる音声を出力してもよい。そして、ロボット装置100は、ユーザU1が「いいえ」や「ダメ」等と反応した場合、否定的であると判定し、ユーザU1が「はい」、「いいよ」等と反応した場合、肯定的であると判定してもよい。ロボット装置100は、記憶部120(図3参照)に記憶された否定的な反応のリストである否定一覧情報や、肯定的な反応のリストである肯定一覧情報と、ユーザの応答とを比較することにより、上記の判定を行ってもよい。 In the example of FIG. 1, the user U1 inputs a negative command NG1 that denies a correct answer candidate registered in advance in the robot apparatus 100. Specifically, the user U1 inputs "correct answer registration mode" to the robot apparatus 100 by speaking "no" indicating that the correct answer candidate is incorrect. It should be noted that the negative command NG1 is not limited to “different”, but may be set appropriately such as “incorrect answer” or “wrong”. A microphone provided in the robot device 100 receives an input by detecting a user's utterance. Thereby, the robot apparatus 100 accepts a negative reaction of the user U1 as an input for designating the correct answer registration mode. Note that the robot apparatus 100 may shift to the correct answer registration mode by a user operating a button provided in the robot apparatus 100 itself to shift to the correct answer registration mode. In addition, the robot device 100 may request the user U1 for a response when the robot device 100 cannot determine whether the reaction of the user U1 is positive or negative. In this case, the robot device 100 may output a voice that prompts the user U1 to select affirmative or negative, such as "Is the answer correct? Then, the robot apparatus 100 determines that the user U1 is negative when the user U1 responds with “No” or “No”, and affirmative when the user U1 responds with “Yes” or “Good”. May be determined. The robot apparatus 100 compares the negative response information, which is the negative response list stored in the storage unit 120 (see FIG. 3), and the positive response list information, which is the positive response list, with the user response. Therefore, the above determination may be performed.
 そして、ロボット装置100は、ユーザの入力に応じて、モードを変更する(ステップS17)。図1の例では、ロボット装置100は、ユーザU1の「違う」という否定的コマンドNG1に対応する正解登録モード(図6参照)にモードを変更する。このように、ロボット装置100は、否定的コマンドNG1が入力されると、エージェントは正解登録モードに入り、ユーザU1からの正解の入力を受け付けるまで待機する。 Then, the robot apparatus 100 changes the mode according to the user's input (step S17). In the example of FIG. 1, the robot apparatus 100 changes the mode to the correct answer registration mode (see FIG. 6) corresponding to the negative command NG1 of “NO” of the user U1. In this way, when the negative command NG1 is input, the robot apparatus 100 enters the correct answer registration mode, and waits until the correct answer input from the user U1 is received.
 そして、ユーザU1は、ロボット装置100に正解を提供する(ステップS18)。図1の例では、ユーザU1は、「薄い紫色」と発話することにより、ロボット装置100に「右は何色?」という質問QS1に対する正解AS1である「薄い紫色」を入力する。このような正解提供時においては、ロボット装置100は、質問登録の時と同様に、ユーザの発話が始まってから閾値以上のインタバルにさらなる入力がなければ、その時点までの入力を「正解」として受け付ける。 Then, the user U1 provides the correct answer to the robot device 100 (step S18). In the example of FIG. 1, the user U1 inputs the correct answer AS1 “light purple” to the question QS1 “what color is the right?” By speaking “light purple” to the robot apparatus 100. At the time of providing such a correct answer, as in the case of the question registration, if there is no further input in the interval equal to or more than the threshold after the user's utterance starts, the robot apparatus 100 regards the input up to that point as the “correct answer”. Accept.
 そして、ロボット装置100は、画像、画像に関連する質問、及び質問に対応する正解の組合せを、サポート情報として登録する(ステップS19)。例えば、ロボット装置100は、サポート情報登録モードに移行して、画像、画像に関連する質問、及び質問に対応する正解の組合せを、サポート情報として登録する。図1の例では、ロボット装置100は、追加登録用情報RINF1に示すような画像IM1、質問QS1及び正解AS1の組合せを、サポート情報ID「SP1」により識別されるサポート情報(サポート情報SP1)として登録する。このように、ロボット装置100は、入力画像、入力質問、正解の三つの要素を一つのセットとして登録する。例えば、ロボット装置100は、サポート情報SP1をサポート情報記憶部141(図4参照)に格納する。 Then, the robot apparatus 100 registers, as support information, a combination of an image, a question related to the image, and a correct answer corresponding to the question (step S19). For example, the robot apparatus 100 shifts to the support information registration mode, and registers a combination of an image, a question related to the image, and a correct answer corresponding to the question as the support information. In the example of FIG. 1, the robot apparatus 100 uses the combination of the image IM1, the question QS1, and the correct answer AS1 as shown in the additional registration information RINF1 as the support information (support information SP1) identified by the support information ID “SP1”. register. In this way, the robot apparatus 100 registers the three elements of the input image, the input question, and the correct answer as one set. For example, the robot apparatus 100 stores the support information SP1 in the support information storage unit 141 (see FIG. 4).
 上述したように、ロボット装置100は、画像IM1及び質問QS1に対応する正解として、ロボット装置100自身が出力した正解候補AA1ではなく、ユーザU1により提供された正解AS1を登録する。図1の例では、画像IM1及び質問QS1に対して、正解候補AA1を出力した時点では、ロボット装置100には、正解AS1である「薄い紫色」という正解を含むサポート情報が登録されていなかった。すなわち、ロボット装置100は、画像IM1及び質問QS1に対して、正解候補AA1を出力した時点では、色に関する概念である「薄い紫色」という概念を獲得していなかった。そのため、ロボット装置100は、画像IM1及び質問QS1に対して、適切な「薄い紫色」という正解を応答することができなかった。 As described above, the robot apparatus 100 registers the correct answer AS1 provided by the user U1 as the correct answer corresponding to the image IM1 and the question QS1 instead of the correct answer candidate AA1 output by the robot apparatus 100 itself. In the example of FIG. 1, at the time when the correct answer candidate AA1 is output for the image IM1 and the question QS1, the robot apparatus 100 does not have the support information including the correct answer “light purple”, which is the correct answer AS1, registered therein. .. That is, the robot apparatus 100 did not acquire the concept of “light purple” that is a concept related to color at the time of outputting the correct answer candidate AA1 to the image IM1 and the question QS1. Therefore, the robot apparatus 100 could not respond to the image IM1 and the question QS1 with the correct answer of “light purple”.
 このような状況を、識別可能な答え(クラス)を規定するような処理では、解消することは不可能になる。例えば、画像の特徴量と質問の特徴量を共通空間に射影し、それに基づいて限られた答えのプールから正解を識別するVQA(Visual Question Answering)においても、このような状況が起こり得る。すなわち、VQAにおいては、識別可能な答えの数が(一般的に1000個から1500個の範囲内に)限られているため、その中に入ってない答えは識別が不可能になる。 It is impossible to solve such a situation by processing that specifies an identifiable answer (class). For example, such a situation may occur in VQA (Visual Question Answering) in which the feature amount of an image and the feature amount of a question are projected on a common space and the correct answer is identified based on the feature amount. That is, in VQA, since the number of identifiable answers is limited (generally within the range of 1000 to 1500), the answers that are not included therein cannot be identified.
 一方で、ロボット装置100は、図1中のステップS19以後の応答において、新たに追加登録したサポート情報SP1を用いることにより、入力された画像及び質問に対して、「薄い紫色」という応答が可能となる。このように、ロボット装置100は、VQAと対比した場合、データセットの中で頻度の高い通常1000~1500個等の答えに限らず、新しい答えも学習ができるようになる点で優位性がある。 On the other hand, the robot apparatus 100 can make a response of "light purple" to the input image and question by using the newly added support information SP1 in the response after step S19 in FIG. Becomes As described above, the robot apparatus 100 has an advantage in that, when compared with VQA, not only the answers of 1000 to 1500 or the like, which have a high frequency in the data set, but also new answers can be learned. ..
 さらに詳述すると、ロボット装置100は、図1中のステップS19以後においては、画像IM1及び質問QS1に対して、「薄い紫色」という正しい正解候補の出力が可能となる。すなわち、ロボット装置100は、サポート情報SP1が新たに追加登録されることにより、色に関する概念である「薄い紫色」という概念を獲得することとなる。言い換えると、ロボット装置100は、画像IM1の右側のアイスクリームという対象物の性質に関する概念を獲得することができる。このように、ロボット装置100は、ユーザとの対話において順次新たな概念を獲得し続けることにより、ロボット装置100の提供時点では未知の概念、例えば新語に対応する概念についても獲得することができる。したがって、ロボット装置100は、画像に関する質問に対して適切な応答を可能にすることができる。 More specifically, after step S19 in FIG. 1, the robot apparatus 100 can output the correct correct answer candidate “light purple” for the image IM1 and the question QS1. That is, the robot apparatus 100 acquires the concept of “light purple”, which is a concept related to color, by additionally registering the support information SP1. In other words, the robot apparatus 100 can acquire the concept regarding the property of the object called ice cream on the right side of the image IM1. In this way, the robot apparatus 100 can continuously acquire new concepts in the dialog with the user, and thus can acquire an unknown concept at the time of providing the robot apparatus 100, for example, a concept corresponding to a new word. Therefore, the robotic device 100 can enable an appropriate response to a question regarding an image.
 また、ロボット装置100のように、少ないサンプルからもクラスの識別ができるようなワンショット学習のような学習モデルも存在する。しかしながら、ワンショット学習においては視覚的な要素、すなわち画像のみに基づいた識別が中心であり、ロボット装置100のように、新たな概念、特に対象物の性質等の概念の学習を適切に行うことが難しい。一方で、ロボット装置100は、画像、画像に関連する質問、及び質問に対応する正解の組合せを、サポート情報とすることにより、画像に対応するどのような概念も獲得することができるため、画像に関する質問に対して適切な応答を可能にすることができる。このように、ロボット装置100は、ワンショット学習と対比した場合、可視的な類似度に限らず、質問で与えられた文脈によって同じく物体や同じ画像でも異なる概念の学習が可能になる点で優位性がある。 Also, like the robot device 100, there is a learning model such as one-shot learning that enables class identification from a small number of samples. However, in the one-shot learning, the identification is based mainly on the visual element, that is, only the image, and like the robot device 100, the learning of a new concept, particularly a concept such as the property of an object, is appropriately performed. Is difficult. On the other hand, the robot apparatus 100 can acquire any concept corresponding to the image by using the combination of the image, the question related to the image, and the correct answer corresponding to the question as the support information. May allow appropriate responses to questions regarding. As described above, the robot device 100 is advantageous in that, when compared with one-shot learning, not only the visual similarity but also learning of different concepts can be performed for the same object or the same image depending on the context given in the question. There is a nature.
 さらに詳述すると、ワンショット学習ではクエリ画像(識別される画像)と同じラベルを持つ画像(識別可能な候補画像)を識別できるように学習する。これに対して、ロボット装置100は、画像に質問が加わり、単に視覚的類似度のみならず、質問から与えられた文脈も考慮し、同じ「答え」を持つ画像と質問を識別できるようになる。これにより、ロボット装置100は、同じ物体(対象物)の画像であっても、質問からの文脈によって、色、大きさ、数、目的などの様々な性質や関係を含む属性を獲得することができる。このように、ロボット装置100は、対象物の性質や対象物と他の対象物との関係を含む属性を獲得することができる。すなわち、ロボット装置100は、対象物自体の名称等に加え、対象物の属性を含む種々の概念を獲得することができる。例えば、ロボット装置100は、画像に含まれる対象物の性質または状態に関する概念を獲得することができる。このように、ロボット装置100は、質疑応答を繰り返すことによって、同一画像や同一物体から様々な属性等の概念に対応するサポート情報を登録することができる。これにより、ロボット装置100は、サポート情報の概念に対応する知識を獲得することができる。 More specifically, in the one-shot learning, learning is performed so that an image (identifiable candidate image) having the same label as the query image (identified image) can be identified. On the other hand, the robot apparatus 100 can identify the image and the question having the same “answer” by adding the question to the image and considering not only the visual similarity but also the context given from the question. .. As a result, the robot apparatus 100 can acquire attributes including various properties and relationships such as color, size, number, and purpose depending on the context from the question even if the images of the same object (target object). it can. In this way, the robot apparatus 100 can acquire attributes including the property of the target object and the relationship between the target object and another target object. That is, the robot apparatus 100 can acquire various concepts including the attribute of the target object in addition to the name of the target object itself. For example, the robot apparatus 100 can acquire a concept regarding a property or a state of an object included in an image. Thus, the robot apparatus 100 can register support information corresponding to concepts such as various attributes from the same image or the same object by repeating the question and answer. As a result, the robot apparatus 100 can acquire knowledge corresponding to the concept of support information.
 また、ロボット装置100は、正解をユーザが指定可能であるため、画像に含まれる対象物に対するユーザの印象に関する概念を獲得することができる。また、ロボット装置100は、画像に含まれる対象物の量、色、温度または硬さ等に関する概念を獲得することができる。また、ロボット装置100は、画像に含まれる対象物の他の対象物との関係に関する概念(相対的な概念)も獲得できる。例えば「大きい」や「小さい」等の他の対象物との関係に基づく概念を正解として用いた場合、ロボット装置100は、関係にする概念も獲得できる。このように、ロボット装置100は、同じ物体(対象物)の画像であっても、質問の数だけの概念を獲得することができる。また、ロボット装置100は、画像に関して、質疑応答から得られる文脈を通じて、新しい概念の学習が可能になる。すなわち、ロボット装置100は、画像と質問を通じた新しい概念の学習が可能になる。 Further, since the user can specify the correct answer in the robot apparatus 100, it is possible to acquire the concept of the user's impression of the object included in the image. In addition, the robot apparatus 100 can acquire a concept regarding the amount, color, temperature, hardness, etc. of the object included in the image. The robot apparatus 100 can also acquire a concept (relative concept) regarding the relationship between the target object included in the image and another target object. For example, when a concept based on a relationship with another object such as “large” or “small” is used as a correct answer, the robot apparatus 100 can also acquire a concept to be related. In this way, the robot apparatus 100 can acquire as many concepts as the number of questions even for images of the same object (object). In addition, the robot apparatus 100 can learn a new concept regarding an image through a context obtained from a question and answer session. That is, the robot apparatus 100 can learn a new concept through images and questions.
 また、上記のワンショット学習を通じて少ないサンプルから新しい物体を学習することも可能になってきている。しかしながら、上述したように、実世界での物体(対象物)は名称(ラベル)のみならず、様々な属性も有する。例えば、対象物は、性質や他の物体との関係等の種々の属性を有する。さらに、対象物が有する属性は、状況や文脈に影響される場合も多い。例えば、どの物体(対象物)でも大きさという属性を有する上に、その大きさは相対的な概念である。例えば、このような相対的な概念は、他の対象物との関係に基づく概念であるとも言える。そのため、既存のモデルでは一つの物体に対して、識別を行うための学習以外は困難であった。また、ユーザが直接エージェントやロボットに新しい物体を学習させる場合、その物体に該当する物体のサンプルをカメラやファイルを通じてエージェントやロボットに認識させることが想定される。その際に、ユーザが口頭や文字などでその物体の情報をも入力することも考えられるが、ほとんどの場合その情報は単なる物体のラベルや名称(名前)にとどまる。 Also, it is becoming possible to learn a new object from a small number of samples through the above one-shot learning. However, as described above, an object (object) in the real world has not only a name (label) but also various attributes. For example, the target object has various attributes such as properties and relationships with other objects. Furthermore, the attributes of an object are often affected by the situation or context. For example, any object (object) has an attribute of size, and the size is a relative concept. For example, it can be said that such a relative concept is a concept based on a relationship with another object. Therefore, it was difficult for the existing model except learning for identifying one object. In addition, when a user directly causes an agent or robot to learn a new object, it is assumed that the agent or robot recognizes a sample of an object corresponding to the object through a camera or a file. At that time, the user may input the information of the object verbally or by characters, but in most cases, the information is limited to a simple label or name of the object.
 一方で、ロボット装置100は、画像のみならず、ユーザとの質疑応答を通じて、一つの物体に関しても文脈によって異なる様々な概念を獲得することができるため、画像に関する質問に対して適切な応答を可能にすることができる。ロボット装置100は、人間以外の情報処理を行う主体(例えばコンピュータ等)に新たな概念を学習させるための情報を適切に追加可能にすることができる。例えば、ロボット装置100は、ワンショット学習の方法論をVQAの設定に適用し、より広範囲の概念が学習できるようにすることで、実用性を高めることができる。 On the other hand, the robot apparatus 100 can acquire not only images but also various concepts that differ depending on the context for one object through a question and answer session with the user, and thus can appropriately respond to a question about the image. Can be The robot apparatus 100 can appropriately add information for learning a new concept to an entity (for example, a computer) other than a human who performs information processing. For example, the robot apparatus 100 can improve the practicality by applying the one-shot learning methodology to the setting of VQA so that a wider range of concepts can be learned.
 また、ロボット装置100は、ロボット装置100自身が応答した正解候補が正しい場合もその正解候補を正解として登録してもよい。この点について、図2を用いて説明する。図2は、本開示の第1の実施形態に係る情報処理の他の一例を示す図である。なお、図2中のステップS21~S24は図1中のステップS11~S14と同様であるため、説明を省略する。 Further, the robot apparatus 100 may register the correct answer candidate as the correct answer even when the correct answer candidate responded by the robot apparatus 100 itself is correct. This point will be described with reference to FIG. FIG. 2 is a diagram showing another example of information processing according to the first embodiment of the present disclosure. Note that steps S21 to S24 in FIG. 2 are the same as steps S11 to S14 in FIG. 1, so description thereof will be omitted.
 図2の例では、「右は何色?」という質問QS1を受け付けたロボット装置100は、種々の技術を用いて質問QS1に対する応答を行う。ロボット装置100は、入力された画像IM1と質問QS1をクエリとして、そのクエリに対応する正解の識別を行う。ロボット装置100は、識別処理により決定したサポート情報の正解を正解候補として出力する(ステップS25)。ここで、図2の例では、ロボット装置100には、正解が「薄い紫色」であるサポート情報が既に登録されているものとする。そのため、ロボット装置100は、図8に示すような識別処理によりサポート情報を決定することにより、正解が「薄い紫色」であるサポート情報を質問QS1への応答に用いる。これにより、ロボット装置100は、正解である「薄い紫色」を正解候補AA2として出力する。 In the example of FIG. 2, the robot apparatus 100 that has received the question QS1 "what color is right?" Responds to the question QS1 using various techniques. The robot apparatus 100 uses the input image IM1 and the question QS1 as a query and identifies the correct answer corresponding to the query. The robot apparatus 100 outputs the correct answer of the support information determined by the identification processing as a correct answer candidate (step S25). Here, in the example of FIG. 2, it is assumed that the robot apparatus 100 has already registered support information whose correct answer is “light purple”. Therefore, the robot apparatus 100 determines the support information by the identification processing as shown in FIG. 8 and uses the support information whose correct answer is “light purple” for the response to the question QS1. As a result, the robot apparatus 100 outputs the correct answer “light purple” as the correct answer candidate AA2.
 そして、ユーザU1は、ロボット装置100が応答した正解候補に対する反応をロボット装置100に入力する(ステップS26)。ユーザU1は、ロボット装置100が出力した正解候補AA2である「薄い紫色」が、画像IM1中の右側のアイスクリームのパッケージの色に対応するため、肯定的な反応をロボット装置100に入力する。 Then, the user U1 inputs, into the robot apparatus 100, a reaction to the correct answer that the robot apparatus 100 has responded to (step S26). The user U1 inputs a positive reaction to the robot device 100 because the correct answer candidate AA2 “light purple” output by the robot device 100 corresponds to the color of the ice cream package on the right side in the image IM1.
 図2の例では、ユーザU1は、事前にロボット装置100に登録された正解候補を肯定する肯定的コマンドOK1を入力する。具体的には、ユーザU1は、正解候補が正しいことを示す「正解」と発話することにより、ロボット装置100に応答した正解候補登録モードを指定する入力を行う。なお、肯定的コマンドOK1は、「正解」のみに限らず、「合ってる」、「そうです」等適宜設定されてもよい。例えば、ロボット装置100は、肯定的コマンドOK1を受け付けた場合、サポート情報登録モードに移行する。 In the example of FIG. 2, the user U1 inputs the affirmative command OK1 for affirming the correct answer candidate registered in advance in the robot apparatus 100. Specifically, the user U1 inputs by inputting a correct answer candidate registration mode in response to the robot apparatus 100 by speaking “correct answer” indicating that the correct answer candidate is correct. The affirmative command OK1 is not limited to “correct answer”, but may be set appropriately such as “matched” or “yes”. For example, when the robot device 100 receives the affirmative command OK1, it shifts to the support information registration mode.
 そして、ロボット装置100は、ロボット装置100は、画像、画像に関連する質問、及び質問に対応する正解の組合せを、サポート情報として登録する(ステップS27)。図2の例では、ロボット装置100は、追加登録用情報RINF2に示すような画像IM1、質問QS1及び正解である正解候補AA2の組合せを、サポート情報ID「SP1」により識別されるサポート情報(サポート情報SP1)として登録する。このように、ロボット装置100は、入力画像、入力質問、正解の三つの要素を一つのセットとして登録する。例えば、ロボット装置100は、サポート情報SP1をサポート情報記憶部141(図4参照)に格納する。 Then, the robot apparatus 100 registers, as support information, the combination of the image, the question related to the image, and the correct answer corresponding to the question (step S27). In the example of FIG. 2, the robot apparatus 100 identifies the combination of the image IM1, the question QS1, and the correct answer candidate AA2, which is the correct answer, as shown in the additional registration information RINF2, by the support information (support information ID “SP1” (support Register as information SP1). In this way, the robot apparatus 100 registers the three elements of the input image, the input question, and the correct answer as one set. For example, the robot apparatus 100 stores the support information SP1 in the support information storage unit 141 (see FIG. 4).
 上述したように、ロボット装置100は、画像IM1及び質問QS1に対応する正解として、ロボット装置100自身が出力した正解候補AA2を登録する。これにより、ロボット装置100は、画像に関する質問に対してより適切な応答を可能にすることができる。 As described above, the robot apparatus 100 registers the correct answer candidate AA2 output by the robot apparatus 100 itself as the correct answer corresponding to the image IM1 and the question QS1. Thereby, the robot apparatus 100 can enable a more appropriate response to the question regarding the image.
[1-2.第1の実施形態に係るロボット装置の構成]
 次に、第1の実施形態に係る情報処理を実行する情報処理装置の一例であるロボット装置100の構成について説明する。図3は、本開示の第1の実施形態に係るロボット装置100の構成例を示す図である。
[1-2. Configuration of Robot Device According to First Embodiment]
Next, the configuration of the robot apparatus 100, which is an example of an information processing apparatus that executes information processing according to the first embodiment, will be described. FIG. 3 is a diagram showing a configuration example of the robot apparatus 100 according to the first embodiment of the present disclosure.
 図3に示すように、ロボット装置100は、通信部11と、入力部12と、出力部13と、記憶部14と、制御部15と、センサ部16と、駆動部17とを有する。 As shown in FIG. 3, the robot apparatus 100 includes a communication unit 11, an input unit 12, an output unit 13, a storage unit 14, a control unit 15, a sensor unit 16, and a drive unit 17.
 通信部11は、例えば、NIC(Network Interface Card)や通信回路等によって実現される。通信部11は、ネットワークN(インターネット等)と有線又は無線で接続され、ネットワークNを介して、他の装置等との間で情報の送受信を行う。 The communication unit 11 is realized by, for example, a NIC (Network Interface Card) or a communication circuit. The communication unit 11 is connected to a network N (Internet or the like) by wire or wirelessly, and transmits / receives information to / from other devices or the like via the network N.
 入力部12は、ユーザから各種操作が入力される。入力部12は、ユーザによる入力を受け付ける。入力部12は、ユーザによる正解候補に対する反応を受け付ける。入力部12は、正解候補に対するユーザの反応が否定的である場合、ユーザによる正解候補とは別の正解候補を受け付ける。入力部12は、音声を検知する機能を有する。例えば、入力部12は、音声を検知するマイクを有する。入力部12は、ユーザによる発話を入力として受け付ける。なお、入力部12は、ロボット装置100に設けられたボタンやタッチパネルを介してユーザからの各種操作を受け付けてもよい。 The user inputs various operations to the input unit 12. The input unit 12 receives an input from the user. The input unit 12 receives a response to the correct candidate by the user. When the user's reaction to the correct answer candidate is negative, the input unit 12 accepts a correct answer candidate different from the correct answer candidate by the user. The input unit 12 has a function of detecting voice. For example, the input unit 12 has a microphone that detects voice. The input unit 12 receives a user's utterance as an input. The input unit 12 may receive various operations from the user via buttons or a touch panel provided on the robot apparatus 100.
 出力部13は、各種情報を出力する。出力部13は、音声を出力する機能を有する。例えば、出力部13は、音声を出力するスピーカーを有する。出力部13は、質問に対応する正解候補を出力する。出力部13は、質問を出力する。出力部13は、センサ部16によりユーザが検知された場合、質問を出力する。出力部13は、決定部156により決定されたサポート情報(決定サポート情報)を用いて応答を出力する。出力部13は、ユーザに正解をリクエストする音声出力を行う。例えば、出力部13は、決定サポート情報に含まれる正解を出力する。なお、出力部13は、ロボット装置100に設けられたディスプレイ等の表示部に各種情報を表示させることにより、各種操作を出力してもよい。 The output unit 13 outputs various information. The output unit 13 has a function of outputting voice. For example, the output unit 13 has a speaker that outputs sound. The output unit 13 outputs correct answer candidates corresponding to the question. The output unit 13 outputs the question. The output unit 13 outputs a question when the user is detected by the sensor unit 16. The output unit 13 outputs a response using the support information (determined support information) determined by the determination unit 156. The output unit 13 outputs a voice requesting a correct answer from the user. For example, the output unit 13 outputs the correct answer included in the decision support information. The output unit 13 may output various operations by displaying various information on a display unit such as a display provided in the robot apparatus 100.
 記憶部14は、例えば、RAM(Random Access Memory)、フラッシュメモリ(Flash Memory)等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。記憶部14は、サポート情報記憶部141と、モデル情報記憶部142と、モード情報記憶部143とを有する。 The storage unit 14 is realized by, for example, a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory (Flash Memory), or a storage device such as a hard disk or an optical disk. The storage unit 14 includes a support information storage unit 141, a model information storage unit 142, and a mode information storage unit 143.
 サポート情報記憶部141は、サポートに関する各種情報を記憶する。図4は、本開示の第1の実施形態に係るサポート情報記憶部の一例を示す図である。図4に、第1の実施形態に係るサポート情報記憶部141の一例を示す。図4の例では、サポート情報記憶部141は、「サポート情報ID」、「画像」、「質問」、「正解」といった項目を有する。 The support information storage unit 141 stores various information regarding support. FIG. 4 is a diagram illustrating an example of the support information storage unit according to the first embodiment of the present disclosure. FIG. 4 shows an example of the support information storage unit 141 according to the first embodiment. In the example of FIG. 4, the support information storage unit 141 has items such as “support information ID”, “image”, “question”, and “correct answer”.
 「サポート情報ID」は、サポート情報を識別するための識別情報を示す。「画像」は、サポート情報として登録された画像を示す。図4では「画像」に「IM11」や「IM12」といった概念的な情報が格納される例を示したが、実際には、画像情報や動画像情報、または、その格納場所を示すファイルパス名などが格納される。「質問」は、サポート情報として登録された質問を示す。図4では「質問」に「QS11」や「QS12」といった概念的な情報が格納される例を示したが、実際には、質問を示す文字情報や音声情報、または、その格納場所を示すファイルパス名などが格納される。「正解」は、サポート情報として登録された正解を示す。図4では「正解」に「AS11」や「AS12」といった概念的な情報が格納される例を示したが、実際には、正解を示す文字情報や音声情報、または、その格納場所を示すファイルパス名などが格納される。 “Support information ID” indicates identification information for identifying support information. "Image" indicates an image registered as support information. Although FIG. 4 shows an example in which conceptual information such as “IM11” and “IM12” is stored in “image”, in reality, image information, moving image information, or a file path name indicating the storage location thereof. Is stored. “Question” indicates a question registered as support information. Although FIG. 4 shows an example in which conceptual information such as “QS11” or “QS12” is stored in the “question”, in reality, character information or voice information indicating the question, or a file indicating the storage location thereof. Path name etc. are stored. “Correct answer” indicates a correct answer registered as support information. Although FIG. 4 shows an example in which conceptual information such as “AS11” or “AS12” is stored in “correct answer”, in reality, character information or voice information indicating the correct answer, or a file indicating the storage location. Path name etc. are stored.
 図4の例では、サポート情報ID「SP11」により識別されるサポート情報(サポート情報SP11)は、画像IM11、質問QS11及び正解AS11の組合せであることを示す。すなわち、サポート情報SP11には、画像IM11の情報、質問QS11の情報及び正解AS11の情報が含まれることを示す。 In the example of FIG. 4, the support information (support information SP11) identified by the support information ID “SP11” is a combination of the image IM11, the question QS11, and the correct answer AS11. That is, it is indicated that the support information SP11 includes information on the image IM11, information on the question QS11, and information on the correct answer AS11.
 モデル情報記憶部142は、モデルに関する情報を記憶する。例えば、モデル情報記憶部142は、学習処理により学習(生成)されたモデル情報(モデルデータ)を記憶する。図5は、本開示の第1の実施形態に係るモデル情報記憶部の一例を示す図である。図5に、第1の実施形態に係るモデル情報記憶部142の一例を示す。図5に示した例では、モデル情報記憶部142は、「モデルID」、「モデルデータ」といった項目が含まれる。 The model information storage unit 142 stores information about the model. For example, the model information storage unit 142 stores the model information (model data) learned (generated) by the learning process. FIG. 5 is a diagram illustrating an example of the model information storage unit according to the first embodiment of the present disclosure. FIG. 5 shows an example of the model information storage unit 142 according to the first embodiment. In the example shown in FIG. 5, the model information storage unit 142 includes items such as “model ID” and “model data”.
 「モデルID」は、モデルを識別するための識別情報を示す。例えば、モデルID「M1」により識別されるモデルは、図8に示すようなクエリに対応するサポート情報の識別(決定)を行うモデルM1に対応する。「モデルデータ」は、モデルのデータを示す。図5では「モデルデータ」に「MDT1」といった概念的な情報が格納される例を示したが、実際には、モデルに含まれるネットワークに関する情報や関数等、そのモデルを構成する種々の情報が含まれる。 “Model ID” indicates identification information for identifying the model. For example, the model identified by the model ID “M1” corresponds to the model M1 that identifies (determines) the support information corresponding to the query as illustrated in FIG. “Model data” indicates model data. Although FIG. 5 shows an example in which conceptual information such as “MDT1” is stored in “model data”, in reality, various information that configures the model, such as information about networks included in the model and functions. included.
 モード情報記憶部143は、ロボット装置100のモードに関する情報を記憶する。図6は、本開示の第1の実施形態に係るモード情報記憶部の一例を示す図である。図6に、第1の実施形態に係るモード情報記憶部143の一例を示す。図6に示すように、モード情報記憶部143は、「モードID」、「モード」、「フラグ」といった項目を有する。 The mode information storage unit 143 stores information about the mode of the robot device 100. FIG. 6 is a diagram illustrating an example of the mode information storage unit according to the first embodiment of the present disclosure. FIG. 6 shows an example of the mode information storage unit 143 according to the first embodiment. As shown in FIG. 6, the mode information storage unit 143 has items such as “mode ID”, “mode”, and “flag”.
 「モードID」は、モードを識別するための情報を示す。「モード」は、モードIDにより識別されるモードの内容を示す。「フラグ」は、設定可能なモードのうち選択されているモード、すなわち現状態のモードを示すフラグである。図6では、「フラグ」の値が「1」である動作モードが選択されているものとする。すなわち、図6では、モードID「MO1」により識別されるモード「通常」が選択され、現状態におけるロボット装置100のモードは通常モードであることを示す。また、図6の例では、モードID「MO2」により識別されるモード(モードMO2)は、質問登録モードであることを示す。また、モードMO2は、フラグが「0」であり、選択されたモードではないことを示す。 “Mode ID” indicates information for identifying the mode. “Mode” indicates the content of the mode identified by the mode ID. The "flag" is a flag indicating a mode selected from the settable modes, that is, a mode in the current state. In FIG. 6, it is assumed that the operation mode in which the value of “flag” is “1” is selected. That is, in FIG. 6, the mode “normal” identified by the mode ID “MO1” is selected, and the mode of the robot apparatus 100 in the current state is the normal mode. In the example of FIG. 6, the mode identified by the mode ID “MO2” (mode MO2) is the question registration mode. Further, the mode MO2 has a flag of "0", which indicates that the mode is not the selected mode.
 図3に戻り、説明を続ける。制御部15は、例えば、CPU(Central Processing Unit)やMPU(Micro Processing Unit)等によって、ロボット装置100内部に記憶されたプログラム(例えば、本開示に係る情報処理プログラム)がRAM(Random Access Memory)等を作業領域として実行されることにより実現される。また、制御部15は、コントローラ(controller)であり、例えば、ASIC(Application Specific Integrated Circuit)やFPGA(Field Programmable Gate Array)等の集積回路により実現されてもよい。 Return to Figure 3 and continue the explanation. In the control unit 15, for example, a program (for example, an information processing program according to the present disclosure) stored in the robot apparatus 100 by a CPU (Central Processing Unit), an MPU (Micro Processing Unit), or the like is a RAM (Random Access Memory). It is realized by executing the above as a work area. The control unit 15 is a controller and may be realized by an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).
 図3に示すように、制御部15は、取得部151と、判定部152と、生成部153と、登録部154と、学習部155と、決定部156とを有し、以下に説明する情報処理の機能や作用を実現または実行する。なお、制御部15の内部構成は、図3に示した構成に限られず、後述する情報処理を行う構成であれば他の構成であってもよい。 As illustrated in FIG. 3, the control unit 15 includes an acquisition unit 151, a determination unit 152, a generation unit 153, a registration unit 154, a learning unit 155, and a determination unit 156, and information described below. Realize or execute processing functions and actions. The internal configuration of the control unit 15 is not limited to the configuration shown in FIG. 3, and may be another configuration as long as it is a configuration for performing information processing described later.
 取得部151は、各種情報を取得する。取得部151は、外部の情報処理装置から各種情報を取得する。取得部151は、記憶部14から各種情報を取得する。取得部151は、入力部12により受け付けられた入力情報を取得する。取得部151は、センサ部16により検知されたセンサ情報を取得する。取得部151は、画像、画像に関連する質問、及び質問に対応する正解を取得する。 The acquisition unit 151 acquires various information. The acquisition unit 151 acquires various types of information from an external information processing device. The acquisition unit 151 acquires various information from the storage unit 14. The acquisition unit 151 acquires the input information received by the input unit 12. The acquisition unit 151 acquires the sensor information detected by the sensor unit 16. The acquisition unit 151 acquires an image, a question related to the image, and a correct answer corresponding to the question.
 判定部152は、各種判定を行う。判定部152は、取得部151により取得された情報に基づいて、各種情報を判定する。判定部152は、記憶部14に記憶された情報に基づいて、各種情報を判定する。 The determination unit 152 makes various determinations. The determination unit 152 determines various information based on the information acquired by the acquisition unit 151. The determination unit 152 determines various information based on the information stored in the storage unit 14.
 判定部152は、種々の技術を適宜用いて、ユーザによる質問の登録が完了したかを判定する。判定部152は、自然言語処理に関する種々の技術を用いて質問(文字情報)を解析することにより、質問の意味や内容を判定(推定)する。 The determination unit 152 appropriately uses various techniques to determine whether the question registration by the user is completed. The determination unit 152 determines (estimates) the meaning and content of the question by analyzing the question (character information) using various techniques related to natural language processing.
 判定部152は、ユーザによる入力が質問であるかを判定する。判定部152は、ユーザの音声情報が変換された文字情報を解析することにより、ユーザによる入力が質問であるかを判定する。判定部152は、ユーザの反応が有ったかを否かを判定する。判定部152は、ユーザによる入力の受付けに応じて、ユーザの反応が有ったかを判定する。判定部152は、マイクによりユーザによる発話を検知した場合、ユーザの反応が有ったと判定する。 The determination unit 152 determines whether the input by the user is a question. The determination unit 152 determines whether the input by the user is a question by analyzing the character information obtained by converting the voice information of the user. The determination unit 152 determines whether or not there is a response from the user. The determination unit 152 determines whether or not there is a response from the user in response to the acceptance of the input by the user. When the utterance by the user is detected by the microphone, the determination unit 152 determines that the user has reacted.
 判定部152は、ユーザの反応が肯定であるか否定であるかを判定する。例えば、判定部152は、ユーザの反応(音声情報)が変換された文字情報を解析することにより、ユーザの反応が肯定的であるか否定的であるかを判定する。 The determination unit 152 determines whether the user's reaction is positive or negative. For example, the determination unit 152 determines whether the user's reaction is affirmative or negative by analyzing the character information in which the user's reaction (voice information) is converted.
 生成部153は、各種生成を行う。生成部153は、取得部151により取得された情報に基づいて、各種情報を生成する。生成部153は、記憶部14に記憶された情報に基づいて、各種情報を生成する。 The generation unit 153 performs various generations. The generation unit 153 generates various information based on the information acquired by the acquisition unit 151. The generation unit 153 generates various information based on the information stored in the storage unit 14.
 生成部153は、センサ部16により検知された入力画像及びユーザにより入力された質問をクエリ情報とするエピソードを生成する。例えば、生成部153は、入力された入力画像及び質問の組合せであるクエリ情報と、サポート情報記憶部141に記憶されたサポート情報とを含むエピソードを生成する。例えば、生成部153は、サポート情報記憶部141に記憶されたサポート情報群を、クエリ情報に対する応答を決定するための情報として含むエピソードを生成する。 The generation unit 153 generates an episode in which the query information is the input image detected by the sensor unit 16 and the question input by the user. For example, the generation unit 153 generates an episode including query information that is a combination of the input image and the input question and the support information stored in the support information storage unit 141. For example, the generation unit 153 generates an episode including the support information group stored in the support information storage unit 141 as information for determining the response to the query information.
 登録部154は、各種登録を行う。登録部154は、取得部151により取得された情報に基づいて、各種情報を登録する。登録部154は、取得部151により取得された情報を記憶部14に登録する。登録部154は、画像を登録する画像登録部として機能する。登録部154は、質問を登録する質問登録部として機能する。登録部154は、正解を登録する正解登録部として機能する。 The registration unit 154 performs various registrations. The registration unit 154 registers various information based on the information acquired by the acquisition unit 151. The registration unit 154 registers the information acquired by the acquisition unit 151 in the storage unit 14. The registration unit 154 functions as an image registration unit that registers an image. The registration unit 154 functions as a question registration unit that registers a question. The registration unit 154 functions as a correct answer registration unit that registers a correct answer.
 登録部154は、取得部151により取得された画像、質問、及び正解の組合せを、一の画像及び一の画像に関連する一の質問を含むクエリ情報への応答の決定に用いるサポート情報として登録する。登録部154は、入力画像、入力質問、正解の三つの要素を一つのセットとして登録する。図1の例では、登録部154は、画像IM1、質問QS1及び正解AS1の組合せを、サポート情報SP1として登録する。図2の例では、登録部154は、画像IM1、質問QS1及び正解である正解候補AA2の組合せを、サポート情報SP1として登録する。 The registration unit 154 registers the combination of the image, the question, and the correct answer acquired by the acquisition unit 151 as support information used for determining a response to the query information including the one image and the one question related to the one image. To do. The registration unit 154 registers the three elements of the input image, the input question, and the correct answer as one set. In the example of FIG. 1, the registration unit 154 registers the combination of the image IM1, the question QS1 and the correct answer AS1 as the support information SP1. In the example of FIG. 2, the registration unit 154 registers the combination of the image IM1, the question QS1, and the correct answer candidate AA2, which is the correct answer, as the support information SP1.
 学習部155は、各種学習を行う。学習部155は、取得部151により取得された情報に基づいて、各種情報を学習する。学習部155は、記憶部14に記憶された情報に基づいて、各種情報を学習する。学習部155は、モデルを学習(生成)する。学習部155は、取得部151により取得された情報に基づいて、モデルを学習(生成)する。学習部155は、記憶部14に記憶された情報に基づいて、モデルを学習(生成)する。 The learning unit 155 performs various kinds of learning. The learning unit 155 learns various information based on the information acquired by the acquisition unit 151. The learning unit 155 learns various information based on the information stored in the storage unit 14. The learning unit 155 learns (generates) a model. The learning unit 155 learns (generates) a model based on the information acquired by the acquisition unit 151. The learning unit 155 learns (generates) a model based on the information stored in the storage unit 14.
 学習部155は、種々の機械学習に関する技術を用いて、モデルを学習する。例えば、学習部155は、図8に示すようなネットワーク構造を有するモデルを学習する。学習部155は、画像及び質問を含むクエリ情報に対応するサポート情報を識別するモデルを学習する。例えば、学習部155は、画像及び質問を含むクエリ情報に対応するサポート情報を識別するモデルM1を学習する。学習部155は、エピソードを学習のセットとして学習処理を行うことにより、モデルを生成してもよい。学習部155は、図7に示すようなエピソードEP1~EP3の各々を学習のセットとして学習処理を行うことにより、モデルを生成してもよい。 The learning unit 155 learns the model using various machine learning technologies. For example, the learning unit 155 learns a model having a network structure as shown in FIG. The learning unit 155 learns a model that identifies support information corresponding to query information including an image and a question. For example, the learning unit 155 learns the model M1 that identifies the support information corresponding to the query information including the image and the question. The learning unit 155 may generate a model by performing a learning process using episodes as a set of learning. The learning unit 155 may generate a model by performing a learning process using each of the episodes EP1 to EP3 as shown in FIG. 7 as a learning set.
 学習部155は、種々の学習手法に基づいて、学習処理を行うことにより、モデルを生成してもよい。学習部155は、ワンショット学習に関する手法に基づいて、学習処理を行うことにより、モデルを生成してもよい。学習部155は、ワンショット学習に関する手法に基づいて、学習処理を行うことにより、モデルM1を生成してもよい。なお、上記は一例であり、学習部155は、画像及び質問を含むクエリ情報に対応するサポート情報を識別するモデルを生成可能であれば、どのような学習手法によりモデルを生成してもよい。 The learning unit 155 may generate a model by performing a learning process based on various learning methods. The learning unit 155 may generate a model by performing a learning process based on a method related to one-shot learning. The learning unit 155 may generate the model M1 by performing a learning process based on a method related to one-shot learning. Note that the above is an example, and the learning unit 155 may generate a model by any learning method as long as it can generate a model that identifies support information corresponding to query information including an image and a question.
 決定部156は、各種決定を行う。決定部156は、取得部151により取得された情報に基づいて、各種情報を決定する。決定部156は、記憶部14に記憶された情報に基づいて、各種情報を決定する。決定部156は、各種推定を行ってもよい。決定部156は、図18に示すような周辺空間の推定を行ってもよい。 The decision unit 156 makes various decisions. The determination unit 156 determines various information based on the information acquired by the acquisition unit 151. The determination unit 156 determines various information based on the information stored in the storage unit 14. The determination unit 156 may make various estimations. The determination unit 156 may estimate the surrounding space as shown in FIG.
 決定部156は、クエリ情報と、サポート情報とに基づいて、クエリ情報の一の質問に対応する一の正解を決定する。決定部156は、クエリ情報に含まれる一の画像及び一の質問と、サポート情報に含まれる画像及び質問とに基づいて、一の正解を決定する。 The determining unit 156 determines one correct answer corresponding to one question of the query information based on the query information and the support information. The determining unit 156 determines one correct answer based on the one image and the one question included in the query information and the image and the one question included in the support information.
 決定部156は、クエリ情報とサポート情報とを含むエピソードを用いて、クエリ情報に対応する正解を含むサポート情報を識別する。決定部156は、クエリ情報に対応する正解を含むサポート情報を識別し、識別したサポート情報をクエリ情報に対応する正解を含む正解サポート情報として決定する。図1の例では、決定部156は、入力された画像IM1と質問QS1をクエリ情報として、そのクエリ情報に対応する正解の識別を行う。 The determining unit 156 identifies the support information including the correct answer corresponding to the query information, using the episode including the query information and the support information. The determination unit 156 identifies support information including a correct answer corresponding to the query information, and determines the identified support information as correct answer support information including a correct answer corresponding to the query information. In the example of FIG. 1, the determination unit 156 uses the input image IM1 and question QS1 as query information and identifies the correct answer corresponding to the query information.
 決定部156は、モードを決定する。決定部156は、決定したモードに基づいて、モードを変更する。決定部156は、ユーザの発話に対応する文字列と対応付情報の文字情報とを比較し、対応する文字情報のモードに移行する。決定部156は、ユーザによる入力に基づいて、モードを変更する。図1の例では、決定部156は、ユーザU1の「質問」というコマンドMD1に対応する質問登録モードにモードを変更する。決定部156は、ユーザU1の「違う」という否定的コマンドNG1に対応する正解登録モードにモードを変更する。例えば、決定部156は、肯定的コマンドOK1を受け付けた場合、サポート情報登録モードに移行する。 The determination unit 156 determines the mode. The determination unit 156 changes the mode based on the determined mode. The determination unit 156 compares the character string corresponding to the utterance of the user with the character information of the association information, and shifts to the corresponding character information mode. The determination unit 156 changes the mode based on the input by the user. In the example of FIG. 1, the determination unit 156 changes the mode to the question registration mode corresponding to the command MD1 "question" of the user U1. The deciding unit 156 changes the mode to the correct answer registration mode corresponding to the negative command NG1 that the user U1 says “no”. For example, when the determination unit 156 receives the affirmative command OK1, the determination unit 156 shifts to the support information registration mode.
 センサ部16は、所定の情報を検知する。センサ部16は、画像を撮像する撮像部としての機能を有する。センサ部16は、画像センサの機能を有し、画像情報を検知する。センサ部16は、画像を入力として受け付ける画像入力部として機能する。なお、センサ部16は、上記に限らず、種々のセンサを有してもよい。センサ部16は、位置センサ、加速度センサ、ジャイロセンサ、温度センサ、湿度センサ、照度センサ、圧力センサ、近接センサ、ニオイや汗や心拍や脈拍や脳波等の生体情報を取得のためのセンサ等の種々のセンサを有してもよい。また、センサ部16における上記の各種情報を検知するセンサは共通のセンサであってもよいし、各々異なるセンサにより実現されてもよい。 The sensor unit 16 detects predetermined information. The sensor unit 16 has a function as an image capturing unit that captures an image. The sensor unit 16 has a function of an image sensor and detects image information. The sensor unit 16 functions as an image input unit that receives an image as an input. The sensor unit 16 is not limited to the above, and may have various sensors. The sensor unit 16 includes a position sensor, an acceleration sensor, a gyro sensor, a temperature sensor, a humidity sensor, an illuminance sensor, a pressure sensor, a proximity sensor, a sensor for acquiring biological information such as odor, sweat, heartbeat, pulse and brain wave. It may have various sensors. Further, the sensor for detecting the above various information in the sensor unit 16 may be a common sensor or may be realized by different sensors.
 駆動部17は、ロボット装置100における物理的構成を駆動する機能を有する。駆動部17は、ロボット装置100の首や手や足等の関節を駆動する機能を有する。駆動部17は、例えばアクチュエータである。なお、駆動部17は、ロボット装置100が所望の動作を実現可能であれば、どのような構成であってもよい。駆動部17は、ロボット装置100の関節の駆動や位置の移動等を実現可能であれば、どのような構成であってもよい。ロボット装置100がキャタピラやタイヤ等の移動機構を有する場合、駆動部17は、キャタピラやタイヤ等を駆動する。駆動部17は、ロボット装置100の首の関節を駆動することにより、ロボット装置100の頭部に設けられたカメラの視点を変更する。例えば、駆動部17は、決定部156により決定された方向の画像を撮像するように、ロボット装置100の首の関節を駆動することにより、ロボット装置100の頭部に設けられたカメラの視点を変更する。また、駆動部17は、カメラの向きや撮像範囲のみを変更するものであってもよい。駆動部17は、カメラの視点を変更するものであってもよい。 The drive unit 17 has a function of driving the physical configuration of the robot device 100. The drive unit 17 has a function of driving the joints of the robot 100 such as the neck and hands and feet. The drive unit 17 is, for example, an actuator. The driving unit 17 may have any configuration as long as the robot apparatus 100 can realize a desired operation. The drive unit 17 may have any configuration as long as it can drive the joints of the robot apparatus 100, move the positions, and the like. When the robot device 100 has a moving mechanism such as tracks and tires, the drive unit 17 drives the tracks and tires. The drive unit 17 drives the joint of the neck of the robot apparatus 100 to change the viewpoint of the camera provided on the head of the robot apparatus 100. For example, the drive unit 17 drives the joint of the neck of the robot apparatus 100 so as to capture the image in the direction determined by the determination unit 156, thereby changing the viewpoint of the camera provided on the head of the robot apparatus 100. change. Further, the drive unit 17 may change only the orientation of the camera or the image pickup range. The drive unit 17 may change the viewpoint of the camera.
 ここで、図7及び図8を用いて、ロボット装置100における識別処理や学習処理について説明する。図7は、本開示に係る識別の一例を示す図である。図7の例では、エピソードEP1~EP3の3個のエピソードを図示するが、エピソードEP1~EP3は、識別処理や学習処理を説明するための一例であり、ロボット装置100は、種々のエピソードを用いてもよい。図8は、本開示に係る学習用のネットワーク構造の一例を示す図である。 Here, the identification process and the learning process in the robot apparatus 100 will be described with reference to FIGS. 7 and 8. FIG. 7 is a diagram showing an example of identification according to the present disclosure. In the example of FIG. 7, three episodes EP1 to EP3 are illustrated, but the episodes EP1 to EP3 are examples for explaining the identification process and the learning process, and the robot device 100 uses various episodes. May be. FIG. 8 is a diagram showing an example of a learning network structure according to the present disclosure.
 図7の例では、ロボット装置100は、クエリQINF1とサポート情報SP11~SP13とを含むエピソードEP1について、識別処理を行う。クエリQINF1には、風船が写った画像IM15と、「風船の色は何色?」という質問QS15とが含まれる。サポート情報SP11には、2つのアイスクリームが写った画像IM11と、「右側のアイスクリームは何色?」という質問QS11と、「薄い紫色」という正解AS11とが含まれる。また、サポート情報SP12には、風船が写った画像IM12と、「これは何?」という質問QS12と、「風船」という正解AS12とが含まれる。また、サポート情報SP13には、空と3個の風船とが写った画像IM13と、「空は何色?」という質問QS13と、「青」という正解AS13とが含まれる。 In the example of FIG. 7, the robot apparatus 100 performs the identification process on the episode EP1 including the query QINF1 and the support information SP11 to SP13. The query QINF1 includes an image IM15 showing a balloon and a question QS15 “What is the color of the balloon?”. The support information SP11 includes an image IM11 showing two ice creams, a question QS11 “what color is the ice cream on the right side?”, And a correct answer AS11 “light purple”. Further, the support information SP12 includes an image IM12 showing a balloon, a question QS12 "What is this?", And a correct answer AS12 "balloon". In addition, the support information SP13 includes an image IM13 showing the sky and three balloons, a question QS13 "What color is the sky?", And a correct answer AS13 "blue".
 この場合、ロボット装置100は、サポート情報SP11~SP13のうち、クエリQINF1に対応するサポート情報を識別する。例えば、ロボット装置100は、図8に示すようなネットワーク構造を有するモデルを用いて、クエリQINF1に対応するサポート情報を識別する。 In this case, the robot apparatus 100 identifies the support information corresponding to the query QINF1 among the support information SP11 to SP13. For example, the robot apparatus 100 identifies the support information corresponding to the query QINF1 using the model having the network structure shown in FIG.
 ロボット装置100は、図8に示すように、クエリやサポート情報に含まれる画像の特徴量及び質問の特徴量を共通空間に射影し、その距離比較を行うことにより、クエリに対応するサポート情報を識別する。図8中の処理群PS1が上記の識別処理に対応する。ロボット装置100は、部分処理PT1や部分処理PT2や距離比較や識別を含む処理群PS1に対応する処理を行う。例えば、ロボット装置100は、図8中の処理群PS1を行うモデル(識別モデル)を学習する。例えば、識別モデルは、モデルM1である。 As shown in FIG. 8, the robot apparatus 100 projects the feature amount of the image and the feature amount of the question included in the query or the support information onto a common space and compares the distances to obtain the support information corresponding to the query. Identify. The processing group PS1 in FIG. 8 corresponds to the above identification processing. The robot apparatus 100 performs processing corresponding to the partial processing PT1, the partial processing PT2, and the processing group PS1 including distance comparison and identification. For example, the robot apparatus 100 learns a model (identification model) that performs the processing group PS1 in FIG. For example, the identification model is the model M1.
 ロボット装置100は、部分処理PT1によりクエリに関する処理を行う。ロボット装置100は、クエリに含まれる画像(入力画像)の特徴量(例えばベクトル)を抽出する。ロボット装置100は、画像特徴量を抽出するネットワーク(以下「画像特徴抽出ネットワーク」ともいう)に入力画像を入力することにより入力画像の特徴量(以下「画像特徴量」ともいう)を抽出する。例えば、ロボット装置100は、画像特徴抽出ネットワークに入力画像を入力し、入力画像の画像特徴量を画像特徴抽出ネットワークに出力させる。例えば、ロボット装置100は、画像特徴抽出ネットワークに画像特徴量を示すベクトルを出力させる。図7中のエピソードEP1の例では、ロボット装置100は、画像特徴抽出ネットワークに画像IM15を入力することにより画像IM15の画像特徴量を抽出する。 The robot device 100 performs a query-related process by the partial process PT1. The robot apparatus 100 extracts the feature amount (for example, vector) of the image (input image) included in the query. The robot apparatus 100 extracts the feature amount of the input image (hereinafter also referred to as “image feature amount”) by inputting the input image into a network for extracting the image feature amount (hereinafter also referred to as “image feature extraction network”). For example, the robot apparatus 100 inputs an input image into the image feature extraction network and outputs the image feature amount of the input image to the image feature extraction network. For example, the robot apparatus 100 causes the image feature extraction network to output a vector indicating the image feature amount. In the example of episode EP1 in FIG. 7, the robot apparatus 100 extracts the image feature amount of the image IM15 by inputting the image IM15 to the image feature extraction network.
 また、ロボット装置100は、クエリに含まれる質問(入力質問)の特徴量を抽出する。ロボット装置100は、質問特徴量を抽出するネットワーク(以下「質問特徴抽出ネットワーク」ともいう)に入力質問を入力することにより入力質問の特徴量(以下「質問特徴量」ともいう)を抽出する。例えば、ロボット装置100は、質問特徴抽出ネットワークに入力質問を入力し、入力質問の質問特徴量を質問特徴抽出ネットワークに出力させる。例えば、ロボット装置100は、質問特徴抽出ネットワークに質問特徴量を示すベクトルを出力させる。図7中のエピソードEP1の例では、ロボット装置100は、質問特徴抽出ネットワークに質問QS15を入力することにより質問QS15の質問特徴量を抽出する。 The robot apparatus 100 also extracts the feature amount of the question (input question) included in the query. The robot apparatus 100 extracts the feature amount of the input question (hereinafter also referred to as “quest feature amount”) by inputting the input question into a network for extracting the question feature amount (hereinafter also referred to as “question feature extraction network”). For example, the robot apparatus 100 inputs an input question into the question feature extraction network and outputs the question feature amount of the input question to the question feature extraction network. For example, the robot apparatus 100 causes the question feature extraction network to output a vector indicating the question feature amount. In the example of episode EP1 in FIG. 7, the robot apparatus 100 extracts the question feature amount of the question QS15 by inputting the question QS15 to the question feature extraction network.
 そして、ロボット装置100は、クエリに含まれる画像から抽出した画像特徴量と、クエリに含まれる質問から抽出した質問特徴量とを共通空間(例えばN次元空間)に射影する。ロボット装置100は、画像特徴量と質問特徴量とを共通空間に射影するネットワーク(以下「射影ネットワーク」ともいう)に画像特徴量と質問特徴量とを入力する。例えば、ロボット装置100は、画像特徴量と質問特徴量とを統合し、共通空間に射影する。例えば、ロボット装置100は、共通空間に射影することにより、画像特徴量と質問特徴量とを統合した特徴量(以下「統合特徴量」ともいう)を射影ネットワークに出力させる。例えば、射影ネットワークは、画像特徴量(ベクトル)と質問特徴量(ベクトル)とを単純に連結した統合特徴量(ベクトル)を出力するものであってもよい。図7中のエピソードEP1の例では、ロボット装置100は、射影ネットワークに画像IM15の画像特徴量と質問QS15の質問特徴量を入力することにより、画像IM15及び質問QS15の統合特徴量(以下「統合特徴量FT15」ともいう)を射影ネットワークに出力させる。 Then, the robot apparatus 100 projects the image feature amount extracted from the image included in the query and the question feature amount extracted from the question included in the query onto a common space (for example, an N-dimensional space). The robot apparatus 100 inputs the image feature amount and the question feature amount into a network that projects the image feature amount and the question feature amount onto a common space (hereinafter also referred to as “projection network”). For example, the robot apparatus 100 integrates the image feature amount and the question feature amount, and projects them in the common space. For example, the robot apparatus 100 outputs a feature amount obtained by integrating the image feature amount and the question feature amount (hereinafter, also referred to as “integrated feature amount”) to the projection network by projecting in the common space. For example, the projection network may output an integrated feature amount (vector) obtained by simply connecting the image feature amount (vector) and the query feature amount (vector). In the example of episode EP1 in FIG. 7, the robot apparatus 100 inputs the image feature amount of the image IM15 and the question feature amount of the question QS15 to the projection network, and thereby the integrated feature amount of the image IM15 and the question QS15 (hereinafter, “integrated feature amount”). (Also referred to as feature amount FT15 ”) is output to the projection network.
 ロボット装置100は、部分処理PT2によりサポート情報に関する処理を行う。例えば、ロボット装置100は、部分処理PT2をサポート情報ごとに行う。図7中のエピソードEP1の例では、ロボット装置100は、サポート情報SP11~SP13の各々について部分処理PT2を行う。 The robot apparatus 100 performs processing related to support information by the partial processing PT2. For example, the robot apparatus 100 performs the partial process PT2 for each piece of support information. In the example of episode EP1 in FIG. 7, the robot apparatus 100 performs the partial process PT2 for each of the support information SP11 to SP13.
 ロボット装置100は、サポート情報に含まれる画像(入力画像)の特徴量(例えばベクトル)を抽出する。ロボット装置100は、画像特徴量を抽出するネットワーク(画像特徴抽出ネットワーク)に入力画像を入力することにより入力画像の特徴量(画像特徴量)を抽出する。例えば、ロボット装置100は、画像特徴抽出ネットワークに入力画像を入力し、入力画像の画像特徴量を画像特徴抽出ネットワークに出力させる。例えば、ロボット装置100は、画像特徴抽出ネットワークに画像特徴量を示すベクトルを出力させる。図7中のエピソードEP1の例では、ロボット装置100は、画像特徴抽出ネットワークにサポート情報SP11の画像IM11を入力することにより画像IM11の画像特徴量を抽出する。 The robot device 100 extracts the feature amount (for example, vector) of the image (input image) included in the support information. The robot apparatus 100 extracts the feature amount (image feature amount) of the input image by inputting the input image to a network (image feature extraction network) for extracting the image feature amount. For example, the robot apparatus 100 inputs an input image into the image feature extraction network and outputs the image feature amount of the input image to the image feature extraction network. For example, the robot apparatus 100 causes the image feature extraction network to output a vector indicating the image feature amount. In the example of episode EP1 in FIG. 7, the robot apparatus 100 extracts the image feature amount of the image IM11 by inputting the image IM11 of the support information SP11 to the image feature extraction network.
 また、ロボット装置100は、サポート情報に含まれる質問(入力質問)の特徴量を抽出する。ロボット装置100は、質問特徴量を抽出するネットワーク(質問特徴抽出ネットワーク)に入力質問を入力することにより入力質問の特徴量(質問特徴量)を抽出する。例えば、ロボット装置100は、質問特徴抽出ネットワークに入力質問を入力し、入力質問の質問特徴量を質問特徴抽出ネットワークに出力させる。例えば、ロボット装置100は、質問特徴抽出ネットワークに質問特徴量を示すベクトルを出力させる。図7中のエピソードEP1の例では、ロボット装置100は、質問特徴抽出ネットワークにサポート情報SP11の質問QS11を入力することにより質問QS11の質問特徴量を抽出する。 Further, the robot apparatus 100 extracts the feature amount of the question (input question) included in the support information. The robot apparatus 100 extracts the feature quantity (quest feature quantity) of the input question by inputting the input question into a network (question feature extraction network) for extracting the question feature quantity. For example, the robot apparatus 100 inputs an input question into the question feature extraction network and outputs the question feature amount of the input question to the question feature extraction network. For example, the robot apparatus 100 causes the question feature extraction network to output a vector indicating the question feature amount. In the example of episode EP1 in FIG. 7, the robot apparatus 100 extracts the question feature quantity of the question QS11 by inputting the question QS11 of the support information SP11 to the question feature extraction network.
 そして、ロボット装置100は、サポート情報に含まれる画像から抽出した画像特徴量と、サポート情報に含まれる質問から抽出した質問特徴量とを共通空間(例えばN次元空間)に射影する。ロボット装置100は、画像特徴量と質問特徴量とを共通空間に射影するネットワーク(射影ネットワーク)に画像特徴量と質問特徴量とを入力する。例えば、ロボット装置100は、画像特徴量と質問特徴量とを統合し、共通空間に射影する。例えば、ロボット装置100は、共通空間に射影することにより、画像特徴量と質問特徴量とを統合した統合特徴量を射影ネットワークに出力させる。図7中のエピソードEP1の例では、ロボット装置100は、射影ネットワークにサポート情報SP11の画像IM11の画像特徴量と質問QS11の質問特徴量を入力することにより、画像IM11及び質問QS11の統合特徴量(以下「統合特徴量FT11」ともいう)を射影ネットワークに出力させる。 Then, the robot apparatus 100 projects the image feature amount extracted from the image included in the support information and the question feature amount extracted from the question included in the support information onto a common space (for example, an N-dimensional space). The robot apparatus 100 inputs the image feature amount and the question feature amount to a network (projection network) that projects the image feature amount and the question feature amount on a common space. For example, the robot apparatus 100 integrates the image feature amount and the question feature amount, and projects them in the common space. For example, the robot apparatus 100 causes the projection network to output an integrated feature amount obtained by integrating the image feature amount and the query feature amount by projecting in the common space. In the example of the episode EP1 in FIG. 7, the robot apparatus 100 inputs the image feature amount of the image IM11 of the support information SP11 and the question feature amount of the question QS11 to the projection network, and thus the integrated feature amount of the image IM11 and the question QS11. (Hereinafter, also referred to as “integrated feature amount FT11”) is output to the projection network.
 同様に、ロボット装置100は、射影ネットワークにサポート情報SP12の画像IM12の画像特徴量と質問QS12の質問特徴量を入力することにより、画像IM12及び質問QS12の統合特徴量(以下「統合特徴量FT12」ともいう)を射影ネットワークに出力させる。また、同様に、ロボット装置100は、射影ネットワークにサポート情報SP13の画像IM13の画像特徴量と質問QS13の質問特徴量を入力することにより、画像IM13及び質問QS13の統合特徴量(以下「統合特徴量FT13」ともいう)を射影ネットワークに出力させる。例えば、部分処理PT1と部分処理PT2には共通のネットワーク(画像特徴抽出ネットワークや質問特徴抽出ネットワークや射影ネットワーク)が用いられる。 Similarly, the robot apparatus 100 inputs the image feature amount of the image IM12 and the question feature amount of the question QS12 of the support information SP12 to the projection network, and thereby the integrated feature amount of the image IM12 and the question QS12 (hereinafter, “integrated feature amount FT12”). (Also referred to as “”) is output to the projection network. Similarly, the robot apparatus 100 inputs the image feature amount of the image IM13 of the support information SP13 and the question feature amount of the question QS13 to the projection network, so that the integrated feature amount of the image IM13 and the question QS13 (hereinafter, “integrated feature amount”). Quantity FT13 ") is output to the projection network. For example, a common network (image feature extraction network, question feature extraction network, projection network) is used for the partial processes PT1 and PT2.
 そして、ロボット装置100は、共通空間に射影した情報に基づいて、クエリと各サポート情報との距離を比較する。図7中のエピソードEP1の例では、ロボット装置100は、共通空間に射影した情報に基づいて、クエリQINF1と各サポート情報SP11~SP13との距離を比較する。例えば、ロボット装置100は、クエリQINF1の統合特徴量FT15と、各サポート情報SP11~SP13の統合特徴量FT11~FT13との距離を比較する。 Then, the robot apparatus 100 compares the distance between the query and each piece of support information based on the information projected in the common space. In the example of episode EP1 in FIG. 7, the robot apparatus 100 compares the distance between the query QINF1 and each of the support information SP11 to SP13 based on the information projected in the common space. For example, the robot apparatus 100 compares the distance between the integrated feature quantity FT15 of the query QINF1 and the integrated feature quantities FT11 to FT13 of each of the support information SP11 to SP13.
 そして、ロボット装置100は、クエリと各サポート情報との距離の比較結果に基づいて、クエリに対応するサポート情報を識別する。例えば、ロボット装置100は、クエリと近似するサポート情報を、クエリに対応するサポート情報として識別する。例えば、ロボット装置100は、クエリとの距離が最も近いサポート情報を、クエリに対応するサポート情報として識別する。図7中のエピソードEP1の例では、ロボット装置100は、クエリQINF1の統合特徴量FT15と最も近い統合特徴量のサポート情報を、クエリQINF1に対応するサポート情報として識別する。ロボット装置100は、クエリQINF1の統合特徴量FT15と最も近い統合特徴量FT11のサポート情報SP11を、クエリQINF1に対応するサポート情報として識別する。 Then, the robot apparatus 100 identifies the support information corresponding to the query based on the result of comparison of the distance between the query and each support information. For example, the robot apparatus 100 identifies support information that approximates the query as support information that corresponds to the query. For example, the robot apparatus 100 identifies the support information having the shortest distance from the query as the support information corresponding to the query. In the example of the episode EP1 in FIG. 7, the robot apparatus 100 identifies the support information of the integrated feature amount that is the closest to the integrated feature amount FT15 of the query QINF1 as the support information corresponding to the query QINF1. The robot apparatus 100 identifies the support information SP11 of the integrated feature quantity FT11 closest to the integrated feature quantity FT15 of the query QINF1 as the support information corresponding to the query QINF1.
 このように、図7中のエピソードEP1の例では、ロボット装置100は、サポート情報SP11~SP13のうち、サポート情報SP11をクエリQINF1に対応するサポート情報として識別する。そして、ロボット装置100は、クエリQINF1に対応する正解を、サポート情報SP11の正解AS11に決定してもよい。具体的には、ロボット装置100は、クエリQINF1に対応する正解を、正解AS11と同じ「薄い紫色」である正解AS15に決定してもよい。そして、ロボット装置100は、クエリQINF1に含まれる画像IM15及び質問QS15と、正解AS15の組合せをサポート情報として登録してもよい。 As described above, in the example of episode EP1 in FIG. 7, the robot apparatus 100 identifies the support information SP11 among the support information SP11 to SP13 as the support information corresponding to the query QINF1. Then, the robot apparatus 100 may determine the correct answer corresponding to the query QINF1 as the correct answer AS11 of the support information SP11. Specifically, the robot apparatus 100 may determine the correct answer corresponding to the query QINF1 to be the correct answer AS15 which is the same “light purple” as the correct answer AS11. Then, the robot apparatus 100 may register, as support information, a combination of the image IM15 and the question QS15 included in the query QINF1 and the correct answer AS15.
 また、ロボット装置100は、クエリQINF1の正解AS15が取得済みである場合、識別結果に基づいて学習処理を行ってもよい。例えば、ロボット装置100は、上記の画像特徴抽出ネットワークや質問特徴抽出ネットワークや射影ネットワークを含む識別モデルを学習してもよい。例えば、ロボット装置100は、図8に示すような識別モデルにより、クエリQINF1に対応するサポート情報がサポート情報SP12と識別された場合、クエリQINF1に対応するサポート情報がサポート情報SP11と識別されるように、識別モデルを学習してもよい。なお、このような学習は、例えば、バックプロパゲーションや確率的勾配降下法等の任意の学習手法が採用可能である。 Further, the robot device 100 may perform the learning process based on the identification result when the correct answer AS15 of the query QINF1 has been acquired. For example, the robot apparatus 100 may learn an identification model including the image feature extraction network, the question feature extraction network, and the projection network described above. For example, when the support information corresponding to the query QINF1 is identified as the support information SP12 by the identification model as shown in FIG. 8, the robot apparatus 100 identifies the support information corresponding to the query QINF1 as the support information SP11. Alternatively, the discrimination model may be learned. For such learning, for example, an arbitrary learning method such as back propagation or stochastic gradient descent can be adopted.
 また、図7の例では、ロボット装置100は、クエリQINF2とサポート情報SP21~SP23とを含むエピソードEP2について、識別処理を行う。クエリQINF2には、2個の風船が写った画像IM25と、「風船は何個?」という質問QS25とが含まれる。サポート情報SP21には、2つのアイスクリームが写った画像IM21と、「アイスクリームは何個?」という質問QS21と、「2個」という正解AS21とが含まれる。また、サポート情報SP22には、風船が写った画像IM22と、「これは何色?」という質問QS22と、「青」という正解AS22とが含まれる。また、サポート情報SP23には、空と3個の風船とが写った画像IM23と、「風船は何個?」という質問QS23と、「3個」という正解AS23とが含まれる。 Further, in the example of FIG. 7, the robot device 100 performs the identification process on the episode EP2 including the query QINF2 and the support information SP21 to SP23. The query QINF2 includes an image IM25 showing two balloons and a question QS25 asking "how many balloons?". The support information SP21 includes an image IM21 showing two ice creams, a question QS21 “How many ice creams?”, And a correct answer AS21 “two”. Further, the support information SP22 includes an image IM22 showing a balloon, a question QS22 "what color is this?", And a correct answer AS22 "blue". In addition, the support information SP23 includes an image IM23 showing the sky and three balloons, a question QS23 "How many balloons?", And a correct answer AS23 "3".
 この場合、ロボット装置100は、サポート情報SP21~SP23のうち、クエリQINF2に対応するサポート情報を識別する。例えば、ロボット装置100は、図8に示すようなネットワーク構造を有するモデルを用いて、クエリQINF2に対応するサポート情報を識別する。ロボット装置100は、クエリQINF2の統合特徴量と最も近い統合特徴量のサポート情報SP21を、クエリQINF2に対応するサポート情報として識別する。 In this case, the robot apparatus 100 identifies the support information corresponding to the query QINF2 among the support information SP21 to SP23. For example, the robot apparatus 100 identifies the support information corresponding to the query QINF2 using the model having the network structure as shown in FIG. The robot apparatus 100 identifies the support information SP21 of the integrated feature quantity closest to the integrated feature quantity of the query QINF2 as the support information corresponding to the query QINF2.
 このように、図7中のエピソードEP2の例では、ロボット装置100は、サポート情報SP21~SP23のうち、サポート情報SP21をクエリQINF2に対応するサポート情報として識別する。そして、ロボット装置100は、クエリQINF2に対応する正解を、サポート情報SP21の正解AS21に決定してもよい。具体的には、ロボット装置100は、クエリQINF2に対応する正解を、正解AS21と同じ「2個」である正解AS25に決定してもよい。そして、ロボット装置100は、クエリQINF2に含まれる画像IM25及び質問QS25と、正解AS25の組合せをサポート情報として登録してもよい。これにより、ロボット装置100は、数に関する概念を獲得することができる。 As described above, in the example of the episode EP2 in FIG. 7, the robot apparatus 100 identifies the support information SP21 among the support information SP21 to SP23 as the support information corresponding to the query QINF2. Then, the robot apparatus 100 may determine the correct answer corresponding to the query QINF2 as the correct answer AS21 of the support information SP21. Specifically, the robot apparatus 100 may determine the correct answer AS25 corresponding to the query QINF2 to be the correct answer AS25 which is the same as the correct answer AS21. Then, the robot apparatus 100 may register the combination of the image IM25 and the question QS25 included in the query QINF2 and the correct answer AS25 as support information. Thereby, the robot apparatus 100 can acquire the concept of numbers.
 また、ロボット装置100は、クエリQINF2の正解AS25が取得済みである場合、識別結果に基づいて学習処理を行ってもよい。例えば、ロボット装置100は、図8に示すような識別モデルにより、クエリQINF2に対応するサポート情報がサポート情報SP23と識別された場合、クエリQINF2に対応するサポート情報がサポート情報SP21と識別されるように、識別モデルを学習してもよい。 Also, the robot apparatus 100 may perform the learning process based on the identification result when the correct answer AS25 of the query QINF2 has been acquired. For example, in the robot apparatus 100, when the support information corresponding to the query QINF2 is identified as the support information SP23 by the identification model as shown in FIG. 8, the support information corresponding to the query QINF2 is identified as the support information SP21. Alternatively, the discrimination model may be learned.
 また、図7の例では、ロボット装置100は、クエリQINF3とサポート情報SP31~SP33とを含むエピソードEP3について、識別処理を行う。クエリQINF3には、氷が写った画像IM35と、「触るとどんな感じ?」という質問QS35とが含まれる。サポート情報SP31には、2つのアイスクリームが写った画像IM31と、「触るとどんな感じ?」という質問QS31と、「冷たい」という正解AS31とが含まれる。また、サポート情報SP32には、風船が写った画像IM32と、「これはどんな形?」という質問QS32と、「丸い」という正解AS32とが含まれる。また、サポート情報SP33には、空と3個の風船とが写った画像IM33と、「どんな雰囲気?」という質問QS33と、「ふわふわ」という正解AS33とが含まれる。 Further, in the example of FIG. 7, the robot apparatus 100 performs the identification process on the episode EP3 including the query QINF3 and the support information SP31 to SP33. The query QINF3 includes an image IM35 showing ice and a question QS35 "What does it feel like to touch?" The support information SP31 includes an image IM31 showing two ice creams, a question QS31 "What does it feel like to touch?", And a correct answer AS31 "cold". Further, the support information SP32 includes an image IM32 showing a balloon, a question QS32 "what shape is this?", And a correct answer AS32 "round". In addition, the support information SP33 includes an image IM33 showing the sky and three balloons, a question QS33 "what kind of atmosphere?", And a correct answer AS33 "fluffy".
 この場合、ロボット装置100は、サポート情報SP31~SP33のうち、クエリQINF3に対応するサポート情報を識別する。例えば、ロボット装置100は、図8に示すようなネットワーク構造を有するモデルを用いて、クエリQINF3に対応するサポート情報を識別する。ロボット装置100は、クエリQINF3の統合特徴量と最も近い統合特徴量のサポート情報SP31を、クエリQINF3に対応するサポート情報として識別する。 In this case, the robot apparatus 100 identifies the support information corresponding to the query QINF3 among the support information SP31 to SP33. For example, the robot apparatus 100 identifies the support information corresponding to the query QINF3 using the model having the network structure as shown in FIG. The robot apparatus 100 identifies the support information SP31 of the integrated feature quantity closest to the integrated feature quantity of the query QINF3 as the support information corresponding to the query QINF3.
 このように、図7中のエピソードEP2の例では、ロボット装置100は、サポート情報SP31~SP33のうち、サポート情報SP31をクエリQINF3に対応するサポート情報として識別する。そして、ロボット装置100は、クエリQINF3に対応する正解を、サポート情報SP31の正解AS31に決定してもよい。具体的には、ロボット装置100は、クエリQINF3に対応する正解を、正解AS31と同じ「冷たい」である正解AS35に決定してもよい。そして、ロボット装置100は、クエリQINF3に含まれる画像IM35及び質問QS35と、正解AS35の組合せをサポート情報として登録してもよい。これにより、ロボット装置100は、ユーザの印象に関する概念を獲得することができる。具体的には、ロボット装置100は、対象物の温度に関する概念を獲得することができる。なお、ロボット装置100は、ユーザの印象に関する概念であれば、温度に限らず、硬さ等の種々の概念を獲得することができる。ロボット装置100は、「硬い」や「やわらかい」等の正解を含むサポート情報を登録することにより、硬さに関する概念を獲得することができる。 In this way, in the example of episode EP2 in FIG. 7, the robot apparatus 100 identifies the support information SP31 among the support information SP31 to SP33 as the support information corresponding to the query QINF3. Then, the robot apparatus 100 may determine the correct answer corresponding to the query QINF3 as the correct answer AS31 of the support information SP31. Specifically, the robot apparatus 100 may determine the correct answer corresponding to the query QINF3 to be the correct answer AS35 that is the same "cold" as the correct answer AS31. Then, the robot apparatus 100 may register the combination of the image IM35 and the question QS35 included in the query QINF3 and the correct answer AS35 as support information. Thereby, the robot apparatus 100 can acquire the concept regarding the impression of the user. Specifically, the robot apparatus 100 can acquire the concept regarding the temperature of the object. Note that the robot device 100 can acquire various concepts such as hardness as well as temperature as long as the concept is related to the impression of the user. The robot apparatus 100 can acquire the concept of hardness by registering support information including correct answers such as “hard” and “soft”.
 また、ロボット装置100は、クエリQINF3の正解AS35が取得済みである場合、識別結果に基づいて学習処理を行ってもよい。例えば、ロボット装置100は、図8に示すような識別モデルにより、クエリQINF3に対応するサポート情報がサポート情報SP33と識別された場合、クエリQINF3に対応するサポート情報がサポート情報SP31と識別されるように、識別モデルを学習してもよい。 Further, the robot apparatus 100 may perform the learning process based on the identification result when the correct answer AS35 of the query QINF3 has been acquired. For example, in the robot apparatus 100, when the support information corresponding to the query QINF3 is identified as the support information SP33 by the identification model as shown in FIG. 8, the support information corresponding to the query QINF3 is identified as the support information SP31. Alternatively, the discrimination model may be learned.
 なお、学習処理を行う場合、ロボット装置100は、サポート情報群から1つのサポート情報を選択し、選択したサポート情報をクエリ情報として学習を行なってもよい。 When performing the learning process, the robot apparatus 100 may select one piece of support information from the support information group and perform learning using the selected support information as query information.
 ここで、図9を用いて、ロボット装置100がクエリの入力を受け付けてから、正解候補を出力するまでの処理の流れを説明する。図9は、本開示に係る入力から出力までの構成の一例を示すブロック図である。 Here, the flow of processing from when the robot device 100 receives a query input to when it outputs a correct answer candidate will be described with reference to FIG. 9. FIG. 9 is a block diagram showing an example of a configuration from input to output according to the present disclosure.
 まず、ロボット装置100は、クエリのうち、カメラにより検知した画像の特徴量(画像特徴量)を抽出する。また、ロボット装置100は、クエリのマイクにより入力された質問の音声の特徴量(質問特徴量)を抽出する。そして、ロボット装置100は、画像特徴量及び質問特徴量を共通空間に射影する。これにより、ロボット装置100は、クエリに対応する情報の用意を完了する。 First, the robot apparatus 100 extracts the feature amount (image feature amount) of the image detected by the camera from the query. Further, the robot apparatus 100 extracts the feature amount (question feature amount) of the voice of the question input by the query microphone. Then, the robot apparatus 100 projects the image feature amount and the question feature amount on the common space. Thereby, the robot device 100 completes the preparation of the information corresponding to the query.
 そして、ロボット装置100は、エピソードを生成する。ロボット装置100は、サポート情報を追加することにより、エピソードを生成する。ロボット装置100は、サポート情報記憶部141に記憶されたサポート情報を用いてエピソードを生成する。例えば、ロボット装置100は、サポート情報記憶部141に記憶されたサポート情報のうち、一部のサポート情報を用いてエピソードを生成してもよいし、サポート情報記憶部141に記憶された全サポート情報を用いてエピソードを生成してもよい。 Then, the robot device 100 generates an episode. The robot apparatus 100 generates an episode by adding the support information. The robot apparatus 100 uses the support information stored in the support information storage unit 141 to generate an episode. For example, the robot apparatus 100 may generate an episode by using a part of the support information stored in the support information storage unit 141, or all the support information stored in the support information storage unit 141. May be used to generate episodes.
 そして、ロボット装置100は、エピソードに基づいて識別処理を行う。例えば、ロボット装置100は、エピソードのサポート情報のうち、クエリに対応するサポート情報を識別する。ロボット装置100は、識別したサポート情報を、クエリ情報への応答の決定に用いるサポート情報に決定する。 Then, the robot apparatus 100 performs identification processing based on the episode. For example, the robot apparatus 100 identifies the support information corresponding to the query from the support information of the episode. The robot apparatus 100 determines the identified support information as the support information used to determine the response to the query information.
 そして、ロボット装置100は、スピーカーにより、決定したサポート情報の正解を、クエリに対応する正解候補として出力する。 Then, the robot apparatus 100 outputs the correct answer of the determined support information as a correct answer candidate corresponding to the query by the speaker.
[1-3.第1の実施形態に係る情報処理の手順]
 次に、図10及び図11を用いて、第1の実施形態に係る情報処理の手順について説明する。まず、図10を用いて、本開示の第1の実施形態に係る学習処理の流れについて説明する。図10は、本開示の第1の実施形態に係る情報処理の手順を示すフローチャートである。
[1-3. Information Processing Procedure According to First Embodiment]
Next, a procedure of information processing according to the first embodiment will be described with reference to FIGS. 10 and 11. First, the flow of learning processing according to the first embodiment of the present disclosure will be described using FIG. 10. FIG. 10 is a flowchart showing a procedure of information processing according to the first embodiment of the present disclosure.
 図10に示すように、ロボット装置100は、画像を取得する(ステップS101)。例えば、ロボット装置100は、カメラにより撮像した画像を取得する。 As shown in FIG. 10, the robot apparatus 100 acquires an image (step S101). For example, the robot device 100 acquires an image captured by a camera.
 ロボット装置100は、画像に関連する質問を取得する(ステップS102)。例えば、ロボット装置100は、マイクにより入力されたユーザの質問を取得する。 The robot apparatus 100 acquires a question related to the image (step S102). For example, the robot apparatus 100 acquires the user's question input by the microphone.
 そして、ロボット装置100は、質問に対応する正解を取得する(ステップS103)。例えば、ロボット装置100は、質問に対応する正解候補を出力し、正解候補に対するユーザの反応に応じて正解を取得する。例えば、ロボット装置100は、正解候補に対するユーザの反応が肯定的である場合、正解候補を正解として取得する。例えば、ロボット装置100は、正解候補に対するユーザの反応が否定的である場合、ユーザにより提供される正解を取得する。 Then, the robot apparatus 100 acquires the correct answer corresponding to the question (step S103). For example, the robot apparatus 100 outputs the correct answer candidate corresponding to the question, and acquires the correct answer according to the user's reaction to the correct answer candidate. For example, the robot apparatus 100 acquires the correct answer candidate as the correct answer when the user's reaction to the correct answer candidate is positive. For example, the robot apparatus 100 acquires the correct answer provided by the user when the user's reaction to the correct answer candidate is negative.
 そして、ロボット装置100は、取得した画像、質問、及び正解の組合せをサポート情報として登録する(ステップS104)。例えば、ロボット装置100は、正解候補に対するユーザの反応が肯定的である場合、画像、質問、及び正解である正解候補の組合せをサポート情報として登録する。例えば、ロボット装置100は、正解候補に対するユーザの反応が否定的である場合、画像、質問、及びユーザにより提供された正解の組合せをサポート情報として登録する。例えば、ロボット装置100は、取得した画像、質問、及び正解の組合せを未割当のサポート情報IDと対応付けて、サポート情報記憶部141に格納する。 Then, the robot apparatus 100 registers the combination of the acquired image, question, and correct answer as support information (step S104). For example, if the user's reaction to the correct answer candidate is affirmative, the robot apparatus 100 registers the combination of the image, the question, and the correct answer candidate that is the correct answer as support information. For example, if the user's reaction to the correct answer candidate is negative, the robot apparatus 100 registers the combination of the image, the question, and the correct answer provided by the user as support information. For example, the robot apparatus 100 stores the acquired combination of the image, the question, and the correct answer in the support information storage unit 141 in association with the unallocated support information ID.
 次に、図11を用いて、本開示の第1の実施形態に係るユーザとの対話に基づく登録処理の詳細な流れについて説明する。図11は、本開示の第1の実施形態に係るユーザとの対話による正解の登録処理の手順を示すフローチャートである。 Next, the detailed flow of the registration process based on the dialog with the user according to the first embodiment of the present disclosure will be described using FIG. 11. FIG. 11 is a flowchart showing a procedure of correct answer registration processing by a dialog with a user according to the first embodiment of the present disclosure.
 図11に示すように、ロボット装置100は、入力を受け付ける(ステップS201)。例えば、ロボット装置100は、ユーザの発話による音声を入力として受け付ける。なお、図11の例では、ユーザとの対話部分の処理を主として説明する。そのため、図11の例では、図示を省略するが、ロボット装置100は、ステップS201よりも前に画像(入力画像)を取得済みであるものとする。例えば、ロボット装置100は、カメラにより入力画像を撮像し、取得する。 As shown in FIG. 11, the robot apparatus 100 receives an input (step S201). For example, the robot apparatus 100 receives a voice uttered by the user as an input. In the example of FIG. 11, the process of the interaction part with the user will be mainly described. Therefore, in the example of FIG. 11, although not shown, it is assumed that the robot apparatus 100 has acquired the image (input image) before step S201. For example, the robot apparatus 100 captures and acquires an input image with a camera.
 そして、ロボット装置100は、入力が質問であるかを判定する(ステップS202)。例えば、ロボット装置100は、ユーザの音声情報が変換された文字情報を解析することにより、入力が質問であるかを判定する。ロボット装置100は、入力が質問でないと判定した場合(ステップS202;No)、ステップS201に戻って処理を繰り返す。 Then, the robot apparatus 100 determines whether the input is a question (step S202). For example, the robot apparatus 100 determines whether the input is a question by analyzing the character information obtained by converting the voice information of the user. When determining that the input is not a question (step S202; No), the robot apparatus 100 returns to step S201 and repeats the processing.
 一方、ロボット装置100は、入力が質問であると判定した場合(ステップS202;Yes)、エピソードを生成する(ステップS203)。例えば、ロボット装置100は、入力された入力画像及び質問をクエリ情報とするエピソードを生成する。例えば、ロボット装置100は、入力された入力画像及び質問の組合せであるクエリ情報と、サポート情報記憶部141に記憶されたサポート情報とを含むエピソードを生成する。 On the other hand, when the robot device 100 determines that the input is a question (step S202; Yes), it generates an episode (step S203). For example, the robot device 100 generates an episode in which the input image and question that have been input are used as query information. For example, the robot apparatus 100 generates an episode including query information that is a combination of the input image and the question that have been input and the support information stored in the support information storage unit 141.
 そして、ロボット装置100は、識別を行う(ステップS204)。例えば、ロボット装置100は、エピソードを用いて、クエリ情報に対応する正解を含むサポート情報を識別する。例えば、ロボット装置100は、クエリ情報に対応する正解を含むサポート情報を識別することにより、クエリ情報への応答の決定に用いるサポート情報を決定する。例えば、ロボット装置100は、クエリ情報への応答の決定に用いるサポート情報を決定し、そのサポート情報(決定サポート情報)をクエリ情報への応答の決定に用いる情報として選択する。 Then, the robot apparatus 100 performs identification (step S204). For example, the robot apparatus 100 uses the episode to identify the support information including the correct answer corresponding to the query information. For example, the robot apparatus 100 determines the support information used to determine the response to the query information by identifying the support information including the correct answer corresponding to the query information. For example, the robot apparatus 100 determines the support information used for determining the response to the query information, and selects the support information (determined support information) as the information used for determining the response to the query information.
 そして、ロボット装置100は、応答を出力する(ステップS205)。例えば、ロボット装置100は、決定サポート情報を用いて応答を出力する。例えば、ロボット装置100は、決定サポート情報に含まれる正解を出力する。 Then, the robot device 100 outputs a response (step S205). For example, the robot apparatus 100 outputs a response using the decision support information. For example, the robot apparatus 100 outputs the correct answer included in the decision support information.
 そして、ロボット装置100は、ユーザの反応が有ったかを否かを判定する(ステップS206)。例えば、ロボット装置100は、ユーザによる入力を受け付けたかどうかに基づいて、ユーザの反応が有ったかを判定する。例えば、ロボット装置100は、マイクによりユーザによる発話を検知した場合、ユーザの反応が有ったと判定する。ユーザの反応が有ったと判定していない場合(ステップS206;No)、ロボット装置100は、ステップS206に戻って処理を繰り返す。 Then, the robot apparatus 100 determines whether or not there is a user reaction (step S206). For example, the robot apparatus 100 determines whether there is a reaction of the user based on whether or not the input by the user is accepted. For example, the robot apparatus 100 determines that there is a reaction of the user when the utterance by the user is detected by the microphone. When it is not determined that the user has reacted (step S206; No), the robot apparatus 100 returns to step S206 and repeats the processing.
 一方、ユーザの反応が有ったと判定した場合(ステップS206;Yes)、ロボット装置100は、ユーザの反応が肯定であるかを判定する(ステップS207)。例えば、ロボット装置100は、ユーザの反応(音声情報)が変換された文字情報を解析することにより、ユーザの反応が肯定的であるか否定的であるかを判定する。 On the other hand, when it is determined that the user's reaction has been received (step S206; Yes), the robot apparatus 100 determines whether the user's reaction is affirmative (step S207). For example, the robot device 100 determines whether the user's reaction is positive or negative by analyzing the character information in which the user's reaction (voice information) is converted.
 ロボット装置100は、ユーザの反応が肯定的であると判定した場合(ステップS207;Yes)、出力した応答を正解として登録する(ステップS208)。例えば、ロボット装置100は、ユーザの反応が肯定的であると判定した場合、出力した応答(正解候補)を正解とし、クエリ情報に含まれる入力画像及び質問と組み合わせたサポート情報として登録する。すなわち、ロボット装置100は、クエリ情報に含まれる入力画像、質問、及び出力した応答である正解の組合せを、サポート情報として登録する。 When the robot device 100 determines that the user's reaction is affirmative (step S207; Yes), the output response is registered as a correct answer (step S208). For example, when the robot device 100 determines that the user's reaction is affirmative, the output response (correct answer candidate) is set as a correct answer, and is registered as support information combined with the input image and the question included in the query information. That is, the robot apparatus 100 registers, as the support information, the combination of the input image, the question, and the correct answer that is the output response included in the query information.
 一方、ロボット装置100は、ユーザの反応が否定的であると判定した場合(ステップS207;No)、ユーザに正解をリクエストする(ステップS209)。例えば、ロボット装置100は、ユーザの反応が否定的であると判定した場合、「正解を教えてください」等、ユーザに正解をリクエストする音声出力を行う。なお、ロボット装置100は、ユーザの反応が否定的であると判定した場合、ユーザの次の反応を検知するまで待機してもよい。 On the other hand, when the robot device 100 determines that the user's reaction is negative (step S207; No), it requests the user for the correct answer (step S209). For example, when the robot device 100 determines that the reaction of the user is negative, the robot device 100 outputs a voice requesting the correct answer to the user, such as “Please tell me the correct answer”. If the robot device 100 determines that the user's reaction is negative, the robot device 100 may wait until the user's next reaction is detected.
 そして、ロボット装置100は、正解を取得する(ステップS210)。例えば、ロボット装置100は、ユーザによる入力を正解として取得する。例えば、ロボット装置100は、ユーザの発話(音声情報)が変換された文字情報を正解として取得する。 Then, the robot apparatus 100 acquires the correct answer (step S210). For example, the robot apparatus 100 acquires the input by the user as the correct answer. For example, the robot apparatus 100 acquires the character information obtained by converting the user's utterance (voice information) as the correct answer.
 そして、ロボット装置100は、取得した正解を登録する(ステップS211)。例えば、ロボット装置100は、取得した正解と、クエリ情報に含まれる入力画像及び質問とを組み合わせたサポート情報を登録する。すなわち、ロボット装置100は、クエリ情報に含まれる入力画像、質問、及び取得した正解の組合せを、サポート情報として登録する。 Then, the robot apparatus 100 registers the acquired correct answer (step S211). For example, the robot apparatus 100 registers support information that is a combination of the acquired correct answer, the input image and the question included in the query information. That is, the robot apparatus 100 registers the combination of the input image, the question, and the acquired correct answer included in the query information as support information.
(2.第2の実施形態)
[2-1.本開示の第2の実施形態に係る情報処理の概要]
 上記第1の実施形態においては、ロボット装置100がユーザから質問とその正解への反応を取得する場合を示したが、ロボット装置等の情報処理装置は、質問や正解は種々の態様により取得してもよい。例えば、ロボット装置が質問を行ってもよい。そこで、第2の実施形態では、ロボット装置100Aがユーザに対して質問を行い、ユーザからの回答を取得する例について説明する。なお、第1の実施形態に係るロボット装置100と同様の点については、適宜説明を省略する。
(2. Second embodiment)
[2-1. Outline of information processing according to second embodiment of the present disclosure]
In the first embodiment, the case where the robot apparatus 100 acquires a question and a response to the correct answer from the user has been described. However, the information processing apparatus such as the robot apparatus acquires the question and the correct answer in various modes. May be. For example, the robotic device may ask the question. Therefore, in the second embodiment, an example in which the robot apparatus 100A asks a question to the user and acquires an answer from the user will be described. Note that description of the same points as those of the robot apparatus 100 according to the first embodiment will be appropriately omitted.
 図12を用いて、エージェントであるロボット装置100AがユーザU1に対して質問し、ユーザU1との対話を通じて、画像、質問及び正解の組合せ(サポート情報)を登録する場合を示す。まず、ロボット装置100Aは、画像を取得する(ステップS31)。図12の例では、ロボット装置100Aは、カメラにより2つのアイスクリームを撮像することにより、画像IM2を検知する。例えば、ロボット装置100Aには、ロボット装置100Aに付いているカメラを通じて画像が入力される。 FIG. 12 shows a case where the robot apparatus 100A, which is an agent, asks a question to the user U1 and registers a combination of images, a question, and a correct answer (support information) through a dialogue with the user U1. First, the robot device 100A acquires an image (step S31). In the example of FIG. 12, the robot apparatus 100A detects the image IM2 by capturing images of two ice creams with the camera. For example, an image is input to the robot device 100A through a camera attached to the robot device 100A.
 そして、ロボット装置100Aは、ユーザの有無を検知する(ステップS32)。ロボット装置100Aは、ロボット装置100A自らがユーザに質問するために、ロボット装置100Aの周囲に人が存在するかを検知する。例えば、ロボット装置100Aは、質疑応答の相手となるユーザが周囲にいるかどうかを認識(判定)する。例えば、ロボット装置100Aは、カメラを通じて周りにユーザがいるかどうかを検知する。ロボット装置100Aは、カメラにより撮像された画像に基づいて、ユーザの有無を検知する。なお、ロボット装置100Aは、クエリに用いる画像を撮像するカメラと、ユーザの有無を検知するカメラとを個別に有してもよい。ロボット装置100Aは、カメラに限らず、ユーザの有無を検知可能であれば、種々のセンサによりユーザを検知したり、認識したりしてもよい。図12の例では、ロボット装置100Aは、カメラにより撮像された画像にユーザU1が含まれるため、周囲にユーザが存在すると判定する。この場合、例えば、ロボット装置100Aは、ユーザU1を検知したため、質問モード(図14参照)にモードを変更する。 Then, the robot device 100A detects the presence or absence of a user (step S32). The robot apparatus 100A detects whether a person exists around the robot apparatus 100A in order to ask the user a question. For example, the robot apparatus 100A recognizes (determines) whether or not there is a user who is a partner for the question and answer session. For example, the robot device 100A detects whether or not there is a user around by using a camera. The robot device 100A detects the presence / absence of a user based on the image captured by the camera. The robot device 100A may individually include a camera that captures an image used for a query and a camera that detects the presence or absence of a user. The robot apparatus 100A is not limited to the camera, and may detect or recognize the user by various sensors as long as the presence or absence of the user can be detected. In the example of FIG. 12, the robot device 100A determines that the user is present in the surroundings because the user U1 is included in the image captured by the camera. In this case, for example, the robot apparatus 100A changes the mode to the question mode (see FIG. 14) because the user U1 is detected.
 そして、ロボット装置100Aは、質問を生成する(ステップS33)。例えば、ロボット装置100Aは、カメラに写っている画像に基づいて質問を生成する。図12の例では、ロボット装置100Aは、取得した画像IM2に基づいて画像IM2に関連する質問を生成する。例えば、ロボット装置100Aは、物体認識に関する技術を用いて、画像IM2に含まれる対象物を推定して、質問を生成してもよい。ロボット装置100Aは、推定した対象物に応じて、質問を生成してもよい。例えば、ロボット装置100Aは、質問情報記憶部144(図13参照)に記憶された情報に基づいて、質問を生成してもよい。図12の例では、ロボット装置100Aは、画像IM2に含まれる対象物をアイスクリームである推定し「触ったらどう?」という質問QS2を生成する。 Then, the robot device 100A generates a question (step S33). For example, the robot device 100A generates a question based on the image captured by the camera. In the example of FIG. 12, the robot device 100A generates a question related to the image IM2 based on the acquired image IM2. For example, the robot device 100A may estimate a target object included in the image IM2 and generate a question using a technique related to object recognition. The robot device 100A may generate a question according to the estimated target object. For example, the robot device 100A may generate a question based on the information stored in the question information storage unit 144 (see FIG. 13). In the example of FIG. 12, the robot apparatus 100A estimates that the target object included in the image IM2 is ice cream and generates the question QS2 “What if I touch it?”.
 例えば、ロボット装置100Aは、質問情報記憶部144に記憶された対象物の名称と質問候補とが対応付けられた質問候補情報に基づいて、質問を生成してもよい。例えば、ロボット装置100Aは、質問情報記憶部144に記憶された対象物の名称「アイスクリーム」と、「触ったらどう?」や「値段は高い?」や「味は?」等の質問とが対応付けられた質問候補情報に基づいて、質問を生成してもよい。例えば、ロボット装置100Aは、質問情報記憶部144に記憶された対象物の名称「アイスクリーム」に対応付けられた質問候補のうち、所定の基準に基づいて、出力する質問を決定してもよい。例えば、ロボット装置100Aは、質問候補のうち、ランダムに選択した質問候補を、出力する質問に決定してもよい。例えば、ロボット装置100Aは、質問候補の出力回数を計数し、出力回数が少ない質問候補を、出力する質問に決定してもよい。なお、上記は一例であり、ロボット装置100Aは、ユーザU1に対して質問できれば、どのような技術を適宜用いてユーザU1への質問を行ってもよい。 For example, the robot device 100A may generate the question based on the question candidate information in which the name of the object stored in the question information storage unit 144 and the question candidate are associated with each other. For example, in the robot device 100A, the name of the object “ice cream” stored in the question information storage unit 144 and the questions such as “What if I touch it?”, “Is the price expensive?” The question may be generated based on the associated question candidate information. For example, the robot device 100A may determine the question to be output based on a predetermined criterion among the question candidates associated with the name “ice cream” of the object stored in the question information storage unit 144. . For example, the robot device 100A may determine a question candidate selected at random among the question candidates as a question to be output. For example, the robot device 100A may count the number of times the question candidate is output, and determine the question candidate with a small output number as the question to be output. Note that the above is an example, and the robot apparatus 100A may ask the user U1 any technique as long as it can ask the user U1.
 そして、ロボット装置100Aは、生成した質問を出力する(ステップS34)。図12の例では、ロボット装置100Aは、「触ったらどう?」という質問QS2を出力する。 Then, the robot device 100A outputs the generated question (step S34). In the example of FIG. 12, the robot apparatus 100A outputs the question QS2, "What if I touch it?"
 そして、ユーザU1は、ロボット装置100Aに対して質問QS2に対する正解を提供する(ステップS35)。ユーザU1は、ロボット装置100Aが出力した質問に対する正解をロボット装置100Aに入力する。ユーザU1は、ロボット装置100Aが出力した質問QS2である「触ったらどう?」について、「冷たい」という正解AS2をロボット装置100Aに入力する。 Then, the user U1 provides the robot apparatus 100A with the correct answer to the question QS2 (step S35). The user U1 inputs the correct answer to the question output by the robot apparatus 100A into the robot apparatus 100A. The user U1 inputs the correct answer AS2 “cold” to the robot apparatus 100A for the question QS2 “What if I touch it?” Output by the robot apparatus 100A.
 そして、ロボット装置100Aは、画像、画像に関連する質問、及び質問に対応する正解の組合せを、サポート情報として登録する(ステップS36)。例えば、ロボット装置100Aは、サポート情報登録モードに移行して、画像、画像に関連する質問、及び質問に対応する正解の組合せを、サポート情報として登録する。図12の例では、ロボット装置100Aは、追加登録用情報RINF21に示すような画像IM2、質問QS2及び正解AS2の組合せを、サポート情報ID「SP2」により識別されるサポート情報(サポート情報SP2)として登録する。このように、ロボット装置100Aは、入力画像、入力質問、正解の三つの要素を一つのセットとして登録する。例えば、ロボット装置100Aは、サポート情報SP2をサポート情報記憶部141(図13参照)に格納する。 Then, the robot apparatus 100A registers, as support information, a combination of the image, the question related to the image, and the correct answer corresponding to the question (step S36). For example, the robot apparatus 100A transitions to the support information registration mode, and registers a combination of an image, a question related to the image, and a correct answer corresponding to the question as support information. In the example of FIG. 12, the robot apparatus 100A uses the combination of the image IM2, the question QS2, and the correct answer AS2 as shown in the additional registration information RINF21 as the support information (support information SP2) identified by the support information ID “SP2”. register. In this way, the robot apparatus 100A registers the three elements of the input image, the input question, and the correct answer as one set. For example, the robot device 100A stores the support information SP2 in the support information storage unit 141 (see FIG. 13).
 上述したように、ロボット装置100Aは、ロボット装置100A自らが、画像IM2について質問QS2を出力し、画像IM2及び質問QS2に対応する正解として、ユーザU1により提供された正解AS2を登録する。このように、ロボット装置100Aは、は、自らユーザに質問を聞いて応答を求める。これにより、ロボット装置100Aは、ユーザからの入力を待つことなく、自ら質問を出力することにより、自発的に新たな概念を獲得することができる。したがって、ロボット装置100Aは、画像に関する質問に対して適切な応答を可能にすることができる。 As described above, the robot apparatus 100A itself outputs the question QS2 regarding the image IM2, and registers the correct answer AS2 provided by the user U1 as the correct answer corresponding to the image IM2 and the question QS2. In this way, the robot apparatus 100A asks a question to the user and asks for a response. As a result, the robot apparatus 100A can spontaneously acquire a new concept by outputting a question by itself without waiting for an input from the user. Therefore, the robot device 100A can enable an appropriate response to the question regarding the image.
[2-2.第2の実施形態に係るロボット装置の構成]
 次に、第2の実施形態に係る情報処理を実行する情報処理装置の一例であるロボット装置100Aの構成について説明する。図13は、本開示の第2の実施形態に係るロボット装置100Aの構成例を示す図である。
[2-2. Configuration of Robot Device According to Second Embodiment]
Next, the configuration of the robot device 100A, which is an example of an information processing device that executes information processing according to the second embodiment, will be described. FIG. 13 is a diagram illustrating a configuration example of a robot device 100A according to the second embodiment of the present disclosure.
 図13に示すように、ロボット装置100Aは、通信部11と、入力部12と、出力部13と、記憶部14Aと、制御部15Aと、センサ部16Aと、駆動部17とを有する。 As shown in FIG. 13, the robot device 100A includes a communication unit 11, an input unit 12, an output unit 13, a storage unit 14A, a control unit 15A, a sensor unit 16A, and a drive unit 17.
 入力部12は、ユーザによる質問に対応する正解の入力を受け付ける。出力部13は、質問を出力する。出力部13は、センサ部16Aによりユーザが検知された場合、質問を出力する。出力部13は、生成部153Aにより生成された質問を出力する。 The input unit 12 receives the input of the correct answer corresponding to the question from the user. The output unit 13 outputs the question. The output unit 13 outputs a question when the user is detected by the sensor unit 16A. The output unit 13 outputs the question generated by the generation unit 153A.
 記憶部14Aは、例えば、RAM、フラッシュメモリ等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。記憶部14Aは、サポート情報記憶部141と、モデル情報記憶部142と、モード情報記憶部143Aと、質問情報記憶部144とを有する。 The storage unit 14A is realized by, for example, a semiconductor memory device such as a RAM or a flash memory, or a storage device such as a hard disk or an optical disk. The storage unit 14A includes a support information storage unit 141, a model information storage unit 142, a mode information storage unit 143A, and a question information storage unit 144.
 モード情報記憶部143Aは、ロボット装置100Aのモードに関する情報を記憶する。図14は、本開示の第2の実施形態に係るモード情報記憶部の一例を示す図である。図14に、第2の実施形態に係るモード情報記憶部143Aの一例を示す。図14に示すように、モード情報記憶部143Aは、「モードID」、「モード」、「フラグ」といった項目を有する。図14の例では、モードID「MO21」により識別されるモード(モードMO21)は、質問モードであることを示す。例えば、モードMO21は、ロボット装置100A自身が質問を出力するモードであることを示す。 The mode information storage unit 143A stores information regarding the mode of the robot device 100A. FIG. 14 is a diagram illustrating an example of the mode information storage unit according to the second embodiment of the present disclosure. FIG. 14 shows an example of the mode information storage unit 143A according to the second embodiment. As shown in FIG. 14, the mode information storage unit 143A has items such as “mode ID”, “mode”, and “flag”. In the example of FIG. 14, the mode identified by the mode ID “MO21” (mode MO21) is the inquiry mode. For example, the mode MO21 indicates that the robot apparatus 100A itself outputs a question.
 質問情報記憶部144は、質問に関する各種情報を記憶する。質問情報記憶部144は、ロボット装置100A自身が質問を出力するために用いる情報が記憶される。質問情報記憶部144は、対象物の名称と質問候補とが対応付けられた質問候補情報を記憶する。例えば、質問情報記憶部144は、対象物の名称「アイスクリーム」と、「触ったらどう?」や「値段は高い?」や「味は?」等の質問とが対応付けられた質問候補情報を記憶する。 The question information storage unit 144 stores various information regarding the question. The question information storage unit 144 stores information used by the robot apparatus 100A itself to output a question. The question information storage unit 144 stores question candidate information in which the name of the object and the question candidate are associated with each other. For example, the question information storage unit 144 associates the question candidate information with the name of the object “ice cream” and the questions such as “What if I touch it?”, “Is the price expensive?”, And “What is the taste?”. Memorize
 図13に戻り、説明を続ける。制御部15Aは、例えば、CPUやMPU等によって、ロボット装置100A内部に記憶されたプログラム(例えば、本開示に係る情報処理プログラム)がRAM等を作業領域として実行されることにより実現される。また、制御部15Aは、コントローラであり、例えば、ASICやFPGA等の集積回路により実現されてもよい。 Return to FIG. 13 and continue the explanation. The control unit 15A is realized by, for example, a CPU, an MPU, or the like executing a program (for example, an information processing program according to the present disclosure) stored inside the robot apparatus 100A using a RAM or the like as a work area. The control unit 15A is a controller, and may be realized by an integrated circuit such as an ASIC or FPGA.
 図13に示すように、制御部15Aは、取得部151と、判定部152と、生成部153Aと、登録部154と、学習部155と、決定部156とを有し、以下に説明する情報処理の機能や作用を実現または実行する。なお、制御部15Aの内部構成は、図13に示した構成に限られず、後述する情報処理を行う構成であれば他の構成であってもよい。 As illustrated in FIG. 13, the control unit 15A includes an acquisition unit 151, a determination unit 152, a generation unit 153A, a registration unit 154, a learning unit 155, and a determination unit 156, and information described below. Realize or execute processing functions and actions. The internal configuration of the control unit 15A is not limited to the configuration shown in FIG. 13, and may be any other configuration as long as it is a configuration for performing information processing described later.
 取得部151は、出力部13により出力された質問を取得する。判定部152は、質疑応答の相手となるユーザが周囲にいるかどうかを認識(判定)する。生成部153Aは、質問を生成する。例えば、生成部153A、取得した画像に基づいて画像に関連する質問を生成する。生成部153Aは、物体認識に関する技術の技術を用いて、画像に含まれる対象物を推定して、質問を生成する。生成部153Aは、質問情報記憶部144に記憶された情報に基づいて、質問を生成する。図12の例では、生成部153Aは、画像IM2に含まれる対象物をアイスクリームである推定し「触ったらどう?」という質問QS2を生成する。登録部154は、ユーザにより入力された正解を含む組合せを、サポート情報として登録する。 The acquisition unit 151 acquires the question output by the output unit 13. The determination unit 152 recognizes (determines) whether or not a user who is a partner of the question and answer is in the vicinity. The generation unit 153A generates a question. For example, the generation unit 153A generates a question related to the image based on the acquired image. The generation unit 153A generates a question by estimating the target object included in the image by using a technique of a technique related to object recognition. The generation unit 153A generates a question based on the information stored in the question information storage unit 144. In the example of FIG. 12, the generation unit 153A estimates that the target object included in the image IM2 is ice cream, and generates the question QS2 “What if I touch it?”. The registration unit 154 registers the combination including the correct answer input by the user as support information.
 センサ部16Aは、ユーザの有無を検知する検知部としての機能を有する。センサ部16Aは、ユーザの有無を検知する。例えば、センサ部16Aは、カメラを通じて周りにユーザがいるかどうかを検知する。センサ部16Aは、カメラにより撮像された画像に基づいて、ユーザの有無を検知する。 The sensor unit 16A has a function as a detection unit that detects the presence or absence of a user. The sensor unit 16A detects the presence or absence of a user. For example, the sensor unit 16A detects whether or not there is a user around by using the camera. The sensor unit 16A detects the presence or absence of the user based on the image captured by the camera.
[2-3.第2の実施形態に係る情報処理の手順]
 次に、図15を用いて、第2の実施形態に係る情報処理の手順について説明する。図15を用いて、本開示の第2の実施形態に係るユーザとの対話に基づく登録処理の詳細な流れについて説明する。図15は、本開示の第2の実施形態に係るユーザとの対話による正解の登録処理の手順を示すフローチャートである。
[2-3. Information Processing Procedure According to Second Embodiment]
Next, a procedure of information processing according to the second embodiment will be described with reference to FIG. The detailed flow of the registration process based on the dialog with the user according to the second embodiment of the present disclosure will be described using FIG. 15. FIG. 15 is a flowchart showing the procedure of a correct answer registration process by a dialog with a user according to the second embodiment of the present disclosure.
 図15に示すように、ロボット装置100は、人を検知する(ステップS301)。例えば、ロボット装置100は、カメラ等の種々のセンサを適宜用いて、ユーザの存在を検知する。例えば、ロボット装置100は、カメラが撮像した画像を解析することにより、解析結果に基づいて、ユーザが存在するかを判定する。ロボット装置100は、人を検知していない場合(ステップS301;No)、ステップS301に戻って処理を繰り返す。 As shown in FIG. 15, the robot apparatus 100 detects a person (step S301). For example, the robot apparatus 100 detects the presence of the user by appropriately using various sensors such as a camera. For example, the robot apparatus 100 analyzes the image captured by the camera, and determines whether the user exists based on the analysis result. When the robot device 100 does not detect a person (step S301; No), the process returns to step S301 and repeats the processing.
 一方、ロボット装置100は、人を検知した場合(ステップS301;Yes)、質問を出力する(ステップS302)。なお、図15の例では、ユーザとの対話部分の処理を主として説明する。そのため、図15の例では、図示を省略するが、ロボット装置100は、ステップS302よりも前にクエリとなる画像(入力画像)を取得済みであるものとする。例えば、ロボット装置100は、カメラにより入力画像を撮像し、取得する。例えば、ロボット装置100は、入力された入力画像に基づいて質問を生成し、出力する。例えば、ロボット装置100は、質問情報記憶部144(図13参照)に記憶された質問情報に基づいて質問を生成し、出力する。 On the other hand, when the robot device 100 detects a person (step S301; Yes), it outputs a question (step S302). Note that in the example of FIG. 15, the processing of the dialog portion with the user will be mainly described. Therefore, in the example of FIG. 15, although not shown, it is assumed that the robot apparatus 100 has already acquired an image (input image) to be a query before step S302. For example, the robot apparatus 100 captures and acquires an input image with a camera. For example, the robot apparatus 100 generates and outputs a question based on the input image that is input. For example, the robot apparatus 100 generates and outputs a question based on the question information stored in the question information storage unit 144 (see FIG. 13).
 そして、ロボット装置100は、ユーザの反応が有ったかを否かを判定する(ステップS303)。例えば、ロボット装置100は、ユーザによる入力を受け付けたかどうかに基づいて、ユーザの反応が有ったかを判定する。例えば、ロボット装置100は、マイクによりユーザによる発話を検知した場合、ユーザの反応が有ったと判定する。ユーザの反応が有ったと判定していない場合(ステップS303;No)、ロボット装置100は、ステップS303に戻って処理を繰り返す。 Then, the robot apparatus 100 determines whether or not there is a user reaction (step S303). For example, the robot apparatus 100 determines whether there is a reaction of the user based on whether or not the input by the user is accepted. For example, the robot apparatus 100 determines that there is a reaction of the user when the utterance by the user is detected by the microphone. When it is not determined that the user has reacted (step S303; No), the robot apparatus 100 returns to step S303 and repeats the processing.
 一方、ユーザの反応が有ったと判定した場合(ステップS303;Yes)、ロボット装置100は、入力された応答を正解として登録する(ステップS304)。例えば、ロボット装置100は、入力画像と、出力した質問と、ユーザにより入力された正解とを組み合わせたサポート情報を登録する。 On the other hand, when it is determined that the user's reaction has been received (step S303; Yes), the robot apparatus 100 registers the input response as the correct answer (step S304). For example, the robot apparatus 100 registers support information that is a combination of an input image, an output question, and a correct answer input by the user.
(3.その他の実施形態)
 上述した各実施形態に係る処理は、上記各実施形態以外にも種々の異なる形態(変形例)にて実施されてよい。
(3. Other embodiments)
The processing according to each of the above-described embodiments may be implemented in various different modes (modifications) other than each of the above-described embodiments.
[3-1.その他の構成例]
 例えば、上述した例では、情報処理を行う情報処理装置がロボット装置100、100Aである例を示したが、情報処理装置とロボット装置とは別体であってもよい。この点について、図16及び図17を用いて説明する。図16は、本開示の変形例に係る情報処理システムの構成例を示す図である。図17は、本開示の変形例に係る情報処理装置の構成例を示す図である。
[3-1. Other configuration examples]
For example, in the above-described example, the information processing apparatus that performs information processing is the robot apparatuses 100 and 100A, but the information processing apparatus and the robot apparatus may be separate bodies. This point will be described with reference to FIGS. 16 and 17. FIG. 16 is a diagram showing a configuration example of an information processing system according to a modification of the present disclosure. FIG. 17 is a diagram illustrating a configuration example of the information processing device according to the modification of the present disclosure.
 図16に示すように、情報処理システム1は、ロボット装置10と、情報処理装置100Bとが含まれる。ロボット装置10及び情報処理装置100BはネットワークNを介して、有線又は無線により通信可能に接続される。なお、図16に示した情報処理システム1には、複数台のロボット装置10や、複数台の情報処理装置100Bが含まれてもよい。この場合、情報処理装置100Bは、ネットワークNを介してロボット装置10と通信し、ロボット装置10が収集した情報を基に、モデルの学習やロボット装置10の応答の指示を行なったりしてもよい。 As shown in FIG. 16, the information processing system 1 includes a robot device 10 and an information processing device 100B. The robot device 10 and the information processing device 100B are communicably connected to each other via a network N in a wired or wireless manner. Note that the information processing system 1 illustrated in FIG. 16 may include a plurality of robot devices 10 and a plurality of information processing devices 100B. In this case, the information processing apparatus 100B may communicate with the robot apparatus 10 via the network N and may instruct the model learning or the robot apparatus 10 response based on the information collected by the robot apparatus 10. ..
 ロボット装置10は、カメラにより画像を検知し、マイクによりユーザの発話(質問)を検知する。そして、ロボット装置10は、検知した画像及びユーザの質問に対応する応答をスピーカーによる出力する。ロボット装置10は、情報処理装置100Bとの間で情報の送受信が可能であれば、どのような装置であってもよく、例えば、エンタテインメントロボットや家庭用ロボットと称されるような、人間(ユーザ)と対話するロボットであってもよい。例えば、ロボット装置10は、撮像した画像やユーザとの対話により収集した情報を情報処理装置100Bへ送信する。 The robot device 10 detects an image with a camera and a user's utterance (question) with a microphone. Then, the robot apparatus 10 outputs the detected image and the response corresponding to the user's question by the speaker. The robot device 10 may be any device as long as it can send and receive information to and from the information processing device 100B. For example, a human (user) such as an entertainment robot or a household robot is used. ) May interact with the robot. For example, the robot device 10 transmits the captured image and information collected by a dialogue with the user to the information processing device 100B.
 情報処理装置100Bは、画像、画像に関連する質問、及び質問に対応する正解をサポート情報として登録する情報処理装置である。例えば、情報処理装置100Bは、ロボット装置10に情報を送信し、ロボット装置10を遠隔操作することにより、ロボット装置10によるユーザとの対話を実現してもよい。そして、情報処理装置100Bは、ロボット装置10により取得された各種情報を、ロボット装置10から受信することにより、各種情報を取得する。このように、情報処理装置100Bは、ロボット装置10のユーザとの対話により収集した情報に基づいて、サポート情報を登録する。 The information processing apparatus 100B is an information processing apparatus that registers an image, a question related to the image, and a correct answer corresponding to the question as support information. For example, the information processing apparatus 100 </ b> B may transmit information to the robot apparatus 10 and remotely control the robot apparatus 10 to realize the dialogue with the user by the robot apparatus 10. Then, the information processing apparatus 100B acquires various information by receiving the various information acquired by the robot apparatus 10 from the robot apparatus 10. In this way, the information processing apparatus 100B registers the support information based on the information collected through the dialogue with the user of the robot apparatus 10.
 図17に示すように、情報処理装置100Bは、通信部11Bと、記憶部14Bと、制御部15とを有する。通信部11Bは、ネットワークN(インターネット等)と有線又は無線で接続され、ネットワークNを介して、ロボット装置10との間で情報の送受信を行う。記憶部14Bは、サポート情報記憶部141と、モデル情報記憶部142とを有する。このように、情報処理装置100Bは、センサ部や駆動部等を有さず、ロボット装置としての機能を実現するための構成を有しなくてもよい。なお、情報処理装置100Bは、情報処理装置100Bを管理する管理者等から各種操作を受け付ける入力部(例えば、キーボードやマウス等)や、各種情報を表示するための表示部(例えば、液晶ディスプレイ等)を有してもよい。 As shown in FIG. 17, the information processing device 100B has a communication unit 11B, a storage unit 14B, and a control unit 15. The communication unit 11B is connected to a network N (Internet or the like) by wire or wirelessly, and transmits / receives information to / from the robot apparatus 10 via the network N. The storage unit 14B includes a support information storage unit 141 and a model information storage unit 142. As described above, the information processing apparatus 100B does not have a sensor unit, a driving unit, or the like, and may not have a configuration for realizing the function as the robot device. The information processing apparatus 100B includes an input unit (for example, a keyboard and a mouse) that receives various operations from an administrator who manages the information processing apparatus 100B, and a display unit (for example, a liquid crystal display) for displaying various information. ) May be included.
[3-2.カメラ視点の調整]
 また、ロボット装置100、100Aや情報処理装置100Bは、質疑応答をより効率的にするために、カメラ視点を調整してもよい。この点について、図18~図20を用いて説明する。なお、以下ではロボット装置100を一例として、ロボット装置100がカメラ視点を調整する場合を一例として説明するが、カメラの視点を調節可能であれば、いずれの装置により行われてもよい。
[3-2. Adjustment of camera viewpoint]
In addition, the robot devices 100 and 100A and the information processing device 100B may adjust the camera viewpoint in order to make the question and answer more efficient. This point will be described with reference to FIGS. 18 to 20. Although the robot device 100 will be described below as an example and the case where the robot device 100 adjusts the camera viewpoint will be described as an example, any device may be used as long as the camera viewpoint can be adjusted.
 例えば、上述した実施形態においては、ロボット装置等のエージェントのカメラに物体(対象物)が鮮明に写っていることを前提としている。しかし、実環境では、カメラの視点が物体からずれたり、フォーカスが合わなかったりするなどの問題で、その物体に関した質問が与えられていても、答えに必要な判断自体ができない場合がある。例えば、非特許文献2に開示された視覚障害者等により撮像された画像と質問で構成されているデータセットでは、全体データの約3割が答えに必要な判断(識別)が不可能な画像になっている。非特許文献2に開示されたデータでは、約3割の画像に適切な正解を対応付けることが難しい。そのようなカメラ視点の問題は、新しい概念の学習のみならず、ユーザの認識やコミュニケーションなど、本来のエージェントの目的にも大きな支障となる。そのため、カメラに写っている画像に基づいて周辺空間を推定することによって、どのようにカメラ視点を動かせば必要な情報量が得られるか判定することが有用となる。 For example, in the above-described embodiment, it is premised that an object (object) is clearly captured in the camera of an agent such as a robot device. However, in a real environment, the viewpoint of the camera is shifted from the object or the object is out of focus. Therefore, even if a question about the object is given, it may not be possible to make a judgment necessary for an answer. For example, in the data set composed of an image captured by a visually impaired person or the like and a question disclosed in Non-Patent Document 2, about 30% of the entire data is an image for which judgment (identification) necessary for an answer is impossible. It has become. With the data disclosed in Non-Patent Document 2, it is difficult to associate an appropriate correct answer with about 30% of images. Such a problem from the viewpoint of a camera not only learns a new concept but also seriously hinders the purpose of the original agent such as user recognition and communication. Therefore, it is useful to estimate the surrounding space based on the image captured by the camera and determine how to move the camera viewpoint to obtain the necessary amount of information.
 例えば、ユーザが身に付けるウェアラブル端末としての情報処理装置100Bにより、後述するカメラ視点の調整が行われてもよい。この場合、情報処理装置100Bは、スピーカー等の出力部により、ユーザに移動方向や向きを指定する音声を出力してもよい。また、ユーザが身に付けるウェアラブル端末と情報処理装置100Bとは別体であってもよい。この場合、情報処理装置100Bは、ユーザが身に付けるウェアラブル端末から情報を受信し、受信した情報に基づいて、ウェアラブル端末に対してカメラ視点を調整する指示を行ってもよい。例えば、情報処理装置100Bは、ウェアラブル端末にユーザに移動方向や向きを指定する音声情報を送信し、ウェアラブル端末に音声を出力させてもよい。この場合、情報処理システム1には、情報処理装置100Bとウェアラブル端末とが含まれ、ロボット装置10は含まれなくてもよい。このように、情報処理装置100Bは、スピーカーやディスプレイ等による出力により、ユーザに指示することにより、質疑応答をより効率的にするためのカメラポジション(カメラ視点)をユーザに取らせてもよい。 For example, the information processing apparatus 100B as a wearable terminal worn by the user may adjust the camera viewpoint described later. In this case, the information processing apparatus 100B may output a voice designating the moving direction or the direction to the user by using an output unit such as a speaker. Further, the wearable terminal worn by the user and the information processing apparatus 100B may be separate bodies. In this case, the information processing apparatus 100B may receive information from the wearable terminal worn by the user, and may instruct the wearable terminal to adjust the camera viewpoint based on the received information. For example, the information processing apparatus 100B may transmit voice information designating a moving direction and a direction to the user to the wearable terminal and cause the wearable terminal to output the voice. In this case, the information processing system 1 may include the information processing device 100B and the wearable terminal, and may not include the robot device 10. As described above, the information processing apparatus 100B may instruct the user to output a speaker, a display, or the like to cause the user to take a camera position (camera viewpoint) for more efficient question and answer.
 ここから、図18を用いて、ロボット装置100によるカメラ視点の調整を説明する。図18は、本開示のカメラ視点の調整処理の一例を示す図である。図18の例では、ロボット装置100は、質疑応答をより効率的にするために、カメラに写っている画像から周辺空間を推定し、駆動部17を駆動させ、よりコミュニケーションに適切なカメラポジションを取る。 From here, the adjustment of the camera viewpoint by the robot apparatus 100 will be described with reference to FIG. FIG. 18 is a diagram illustrating an example of a camera viewpoint adjustment process according to the present disclosure. In the example of FIG. 18, in order to make the question and answer more efficient, the robot apparatus 100 estimates the peripheral space from the image captured by the camera, drives the drive unit 17, and sets the camera position more appropriate for communication. take.
 ユーザU1は、ロボット装置100に質問を入力する(ステップS51)。図18の例では、ユーザU1は、「アイス何個ある?」と発話することにより、ロボット装置100に「アイス何個ある?」という質問QS51を入力する。 User U1 inputs a question to robot device 100 (step S51). In the example of FIG. 18, the user U1 inputs the question QS51 “How many ice creams are there?” To the robot apparatus 100 by speaking “How many ice creams are there?”.
 また、ロボット装置100は、画像を取得する(ステップS52)。図18の例では、ロボット装置100は、画像IM50を検知する。このように、ロボット装置100は、カメラ視点の調整前には、2つのアイスクリームを一部(上部)のみが含まれる画像IM50を撮像する。 The robot apparatus 100 also acquires an image (step S52). In the example of FIG. 18, the robot device 100 detects the image IM50. In this way, the robot apparatus 100 images the image IM50 including only a part (upper part) of the two ice creams before adjusting the camera viewpoint.
 そして、ロボット装置100は、周辺空間の推定を行う(ステップS53)。ロボット装置100は、種々の技術を用いて画像IM50に含まれる範囲の周辺空間の推定を行う。例えば、ロボット装置100は、画像IM50に含まれる範囲の上下左右方向の周辺空間の推定を行う。例えば、ロボット装置100は、種々の従来技術を適宜用いて、周辺空間の推定を行う。ロボット装置100は、種々の従来技術を適宜用いて、上下左右の空間に何があるかの推定を行うために、その空間に該当する画像(以下「推定用画像」ともいう)を生成する。 Then, the robot apparatus 100 estimates the surrounding space (step S53). The robot apparatus 100 estimates the peripheral space in the range included in the image IM50 using various techniques. For example, the robot apparatus 100 estimates the surrounding space in the vertical and horizontal directions of the range included in the image IM50. For example, the robot apparatus 100 estimates the surrounding space by appropriately using various conventional techniques. The robot apparatus 100 uses various conventional techniques as appropriate to generate an image (hereinafter also referred to as an “estimation image”) corresponding to the space in order to estimate what is in the space above, below, left, and right.
 例えば、ロボット装置100は、画像を復元するモデル(復元用モデル)を用いて、推定用画像を生成する。例えば、ロボット装置100は、物体の一部が写っている画像から物体全体を復元するように学習された復元用モデル(ネットワーク)を用いて推定用画像を生成する。例えば、ロボット装置100は、復元用モデルを生成した外部装置から復元用モデルを取得する。なお、ロボット装置100は、物体が鮮明に写っている画像の一部を意図的に切り出して、その切り出されたところを復元するような復元用モデル(ネットワーク)を学習してもよい。 For example, the robot apparatus 100 uses the model for restoring the image (restoring model) to generate the estimation image. For example, the robot apparatus 100 generates an estimation image using a restoration model (network) that is learned to restore the entire object from an image showing a part of the object. For example, the robot apparatus 100 acquires the restoration model from the external device that has generated the restoration model. The robot device 100 may learn a restoration model (network) that intentionally cuts out a part of an image in which an object is clearly captured and restores the cut out part.
 まず、ロボット装置100は、物体認識に関する技術等の種々の従来技術を適宜用いて、画像IM50において物体が写っている範囲を特定する。図18の例では、ロボット装置100は、画像IM50の下方向において物体が写っている範囲(下部範囲)を特定し、画像IM50の下部範囲(下部画像)を切り出す。そして、ロボット装置100は、画像IM50の下部画像と復元用モデルを用いて推定用画像である下部復元画像ES51を生成する。また、ロボット装置100は、画像IM50の上方向や右方向や左方向についても同様に処理し、上部復元画像や右部復元画像や左部復元画像を生成する。 First, the robot apparatus 100 specifies a range in which an object is captured in the image IM50 by appropriately using various conventional techniques such as a technique related to object recognition. In the example of FIG. 18, the robot apparatus 100 specifies the range (lower range) in which the object is seen in the lower direction of the image IM50, and cuts out the lower range (lower image) of the image IM50. Then, the robot apparatus 100 uses the lower image of the image IM50 and the restoration model to generate a lower restored image ES51 that is an estimation image. Further, the robot apparatus 100 similarly processes the upward, rightward, and leftward directions of the image IM50 to generate an upper restored image, a right restored image, and a left restored image.
 なお、ロボット装置100は、推定用画像が生成可能であれば、どのような手法により推定用画像を生成してもよい。例えば、ロボット装置100は、上述した非特許文献3に開示された手法に基づいて、画像IM50に含まれる範囲の上下左右の空間の推定用画像を生成してもよい。例えば、ロボット装置100は、敵対的生成ネットワーク(GAN:Generative Adversarial Networks)に関する技術を用いて、画像IM50に含まれる範囲の上下左右の空間の推定用画像を生成してもよい。 Note that the robot apparatus 100 may generate the estimation image by any method as long as the estimation image can be generated. For example, the robot device 100 may generate the estimation images of the spaces above, below, to the left and to the right of the range included in the image IM50 based on the method disclosed in Non-Patent Document 3 described above. For example, the robot device 100 may generate an estimation image of the space above, below, to the left, and to the right of the range included in the image IM50 by using a technology related to a hostile generation network (GAN: Generative Adversarial Networks).
 ロボット装置100は、上述した非特許文献4に開示された手法に基づいて、画像IM50に含まれる範囲の上下左右の空間の推定用画像を生成してもよい。例えば、ロボット装置100は、PixelRNN(Recurrent Neural Network)等の技術を用いて、画像IM50に含まれる範囲の上下左右の空間の推定用画像を生成してもよい。 The robot device 100 may generate estimation images of the spaces above, below, to the left, and right of the range included in the image IM50 based on the method disclosed in Non-Patent Document 4 described above. For example, the robot device 100 may generate an image for estimation of the space above, below, to the left, and to the right of the range included in the image IM50 by using a technology such as PixelRNN (Recurrent Neural Network).
 ロボット装置100は、下部復元画像ES51や上部復元画像や右部復元画像や左部復元画像を用いて、左右上下の方向で周辺空間の推定を行う。 The robot apparatus 100 uses the lower restored image ES51, the upper restored image, the right restored image, and the left restored image to estimate the surrounding space in the left, right, up, and down directions.
 図18の例では、質問QS51が取得済みであるため、ロボット装置100は、質問QS51の情報を用いて、周辺空間の推定を行う。ロボット装置100は、推定用画像と与えられた質問を共通空間に射影し、答えの識別結果が最も高いコンフィデンスを持つ方向を視点方針のターゲットとする。このように、ロボット装置100は、質問の情報を加味することにより、質問に答えるための適合性も考慮した方向へのカメラ視点の調整が可能になる。 In the example of FIG. 18, since the question QS51 has been acquired, the robot apparatus 100 estimates the surrounding space using the information of the question QS51. The robot apparatus 100 projects the estimation image and the given question in the common space, and sets the direction having the highest confidence in the answer identification result as the target of the viewpoint policy. In this way, the robot apparatus 100 can adjust the camera viewpoint in the direction in consideration of the suitability for answering the question by adding the question information.
 例えば、ロボット装置100は、下部復元画像ES51や上部復元画像や右部復元画像や左部復元画像と質問QS51を共通空間に射影し、答えの識別結果のコンフィデンス(信頼度)を示す値(以下「スコア」とする)が最も大きい推定用画像に対応する方向を、カメラ視点を向ける方向に決定する。例えば、ロボット装置100は、識別モデルを用いて、下部復元画像ES51の画像特徴量と質問QS51の質問特徴量とを統合した統合特徴量と、各サポート情報の統合特徴量とを比較した結果に基づくスコアを算出してもよい。例えば、ロボット装置100は、下部復元画像ES51の画像特徴量と質問QS51の組合せと、最も距離が近いサポート情報との間の距離に基づくスコアを算出してもよい。例えば、スコアは、距離が近い程大きな値となり、識別モデルが出力する値であってもよい。 For example, the robot apparatus 100 projects the lower restored image ES51, the upper restored image, the right restored image, the left restored image, and the question QS51 into a common space, and indicates the confidence (reliability) of the identification result of the answer (hereinafter The direction corresponding to the estimation image having the largest "score") is determined as the direction in which the camera viewpoint is directed. For example, the robot apparatus 100 uses the identification model to compare the integrated feature amount obtained by integrating the image feature amount of the lower restored image ES51 and the question feature amount of the question QS51 with the integrated feature amount of each support information. You may calculate the score based on it. For example, the robot apparatus 100 may calculate a score based on the distance between the combination of the image feature amount of the lower restored image ES51 and the question QS51 and the support information having the shortest distance. For example, the score may have a larger value as the distance is shorter, and may be a value output by the identification model.
 なお、質問が未取得である場合、ロボット装置100は、推定された画像の物体識別のコンフィデンスが最も高くなる方向を視点方針のターゲットとしてもよい。例えば、ロボット装置100は、下部復元画像ES51や上部復元画像や右部復元画像や左部復元画像のみを用いて、画像IM50の物体識別のスコアが最も高くなる方向を視点方針のターゲットとしてもよい。 If the question has not been acquired, the robot apparatus 100 may set the direction in which the estimated object identification confidence in the image is highest as the target of the viewpoint policy. For example, the robot apparatus 100 may use only the lower restored image ES51, the upper restored image, the right restored image, and the left restored image to set the direction in which the score of the object identification of the image IM50 is the highest as the target of the viewpoint policy. ..
 ロボット装置100は、視点方針を決定する(ステップS54)。図18の例では、下部復元画像ES51に質問QS51に含まれる「アイス」に対応する物体が含まれるため、下部復元画像ES51、すなわち下方向に対応するスコアが最大であるものとする。そのため、ロボット装置100は、下部復元画像ES51に対応する下方向にカメラ視点を調整すると決定する。この場合、ロボット装置100は、「カメラを下に」というカメラの向きを下向きに変更することを指示する視点調整情報IS51を生成する。 The robot apparatus 100 determines the viewpoint policy (step S54). In the example of FIG. 18, since the lower restored image ES51 includes the object corresponding to “ice” included in the question QS51, it is assumed that the lower restored image ES51, that is, the score corresponding to the downward direction is the maximum. Therefore, the robot apparatus 100 determines to adjust the camera viewpoint in the downward direction corresponding to the lower restored image ES51. In this case, the robot apparatus 100 generates the viewpoint adjustment information IS51, which is an instruction to change the camera direction to "down camera" to "down".
 そして、ロボット装置100は、カメラ視点を調整する動作を行う(ステップS55)。図18の例では、ロボット装置100は、カメラを下に向けることを指示する視点調整情報IS51に基づいて、カメラを下に向けるようにアクチュエータ等を駆動する。例えば、ロボット装置100は、カメラが頭部に設けられている場合、頭部を下に向けるようにアクチュエータ等を駆動する。これにより、ロボット装置100は、2つのアイスクリームが写った画像IM51を撮像する。 Then, the robot apparatus 100 performs an operation of adjusting the camera viewpoint (step S55). In the example of FIG. 18, the robot apparatus 100 drives an actuator or the like so that the camera faces downward based on the viewpoint adjustment information IS51 that instructs the camera to face downward. For example, when the camera is provided on the head, the robot apparatus 100 drives an actuator or the like so that the head faces downward. Thereby, the robot apparatus 100 captures the image IM51 including the two ice creams.
 そして、ロボット装置100は、画像IM51及び質問QS51に基づいて、決定したサポート情報の正解を出力する(ステップS56)。図18の例では、ロボット装置100には、「2個」という正解AS51を出力する。 Then, the robot apparatus 100 outputs the correct answer of the determined support information based on the image IM51 and the question QS51 (step S56). In the example of FIG. 18, the correct answer AS 51 “two” is output to the robot apparatus 100.
 上述したように、ロボット装置100は、画像の入力、周辺空間の推定、視点方針の決定、及び決定に基づいた動作により、カメラ視点の調整を実現する。このように、ロボット装置100は、入力された画像に応じて、カメラ視点を調整することにより、適切な画像が取得できていない場合であっても、適切な画像を取得することが可能となる。これにより、ロボット装置100は、画像に関する質問に対して適切な応答を行うことが可能になる。 As described above, the robot apparatus 100 realizes the adjustment of the camera viewpoint by inputting an image, estimating the surrounding space, determining the viewpoint policy, and performing an operation based on the determination. As described above, the robot apparatus 100 can acquire the appropriate image by adjusting the camera viewpoint according to the input image, even if the appropriate image cannot be acquired. .. Thereby, the robot apparatus 100 can make an appropriate response to the question about the image.
 ここで、図19を用いて、ロボット装置100が入力を受け付けてから、カメラ位置を調整するまでの処理の流れを説明する。図19は、本開示のカメラ視点の調整に係る入力から出力までの構成の一例を示すブロック図である。 Here, the flow of processing from when the robot device 100 receives an input to when the camera position is adjusted will be described with reference to FIG. FIG. 19 is a block diagram showing an example of the configuration from the input to the output related to the adjustment of the camera viewpoint of the present disclosure.
 まず、ロボット装置100は、クエリのうち、カメラにより検知した画像の特徴量(画像特徴量)を抽出する。また、ロボット装置100は、クエリのマイクにより入力された質問の音声の特徴量(質問特徴量)を抽出する。そして、ロボット装置100は、抽出した画像特徴量に基づいて、周辺空間を推定する。これにより、ロボット装置100は、抽出した画像特徴量に基づいて、推定用画像を生成する。例えば、ロボット装置100は、抽出した画像特徴量に基づいて、下方向の周辺空間を推定し、下方向の推定用画像を生成する。そして、ロボット装置100は、推定用画像に関する特徴量及び質問特徴量を共通空間に射影する。 First, the robot apparatus 100 extracts the feature amount (image feature amount) of the image detected by the camera from the query. Further, the robot apparatus 100 extracts the feature amount (question feature amount) of the voice of the question input by the query microphone. Then, the robot apparatus 100 estimates the peripheral space based on the extracted image feature amount. Thereby, the robot apparatus 100 generates an estimation image based on the extracted image feature amount. For example, the robot apparatus 100 estimates the peripheral space in the downward direction based on the extracted image feature amount, and generates the estimation image in the downward direction. Then, the robot apparatus 100 projects the feature amount and the question feature amount regarding the estimation image in the common space.
 そして、ロボット装置100は、エピソードを生成する。ロボット装置100は、サポート情報を追加することにより、エピソードを生成する。ロボット装置100は、サポート情報記憶部141に記憶されたサポート情報を用いてエピソードを生成する。 Then, the robot device 100 generates an episode. The robot apparatus 100 generates an episode by adding the support information. The robot apparatus 100 uses the support information stored in the support information storage unit 141 to generate an episode.
 そして、ロボット装置100は、エピソードに基づいて識別コンフィデンスの検討を行う。例えば、ロボット装置100は、検討した方向のうち最もスコアが大きい方向を移動方向の候補に決定する。そして、ロボット装置100は、未検討の方向が有る場合、未検討の方向について周辺空間の推定を行い、未検討の方向が無くなるまで、識別コンフィデンスの検討を繰り返す。例えば、ロボット装置100は、上方向や右方向や左方向についても推定した下方向と同様に、その方向の周辺空間を推定し、識別コンフィデンスの検討を繰り返す。 Then, the robot apparatus 100 examines the identification confidence based on the episode. For example, the robot apparatus 100 determines the direction having the largest score among the examined directions as a candidate for the moving direction. Then, if there is an unexamined direction, the robot apparatus 100 estimates the surrounding space for the unexamined direction, and repeats the examination of the identification confidence until the unexamined direction disappears. For example, the robot apparatus 100 estimates the peripheral space in the upward direction, the rightward direction, and the leftward direction in the same manner as the estimated downward direction, and repeats the examination of the identification confidence.
 そして、ロボット装置100は、検討した全方向のうち最もスコアが大きい方向に基づいてカメラ移動方針を決定する。例えば、ロボット装置100は、全方向の検討後に、移動方向の候補となっている方向を移動方向に決定する。ロボット装置100は、検討した全方向のうち最もスコアが大きい方向を移動方向に決定する。 Then, the robot apparatus 100 determines the camera movement policy based on the direction having the highest score among all the examined directions. For example, the robot apparatus 100 determines a moving direction candidate direction as a moving direction after examining all directions. The robot apparatus 100 determines, as the moving direction, the direction having the highest score among all the examined directions.
 そして、ロボット装置100は、アクチュエータを駆動し、決定した移動方向にカメラ視点を調整する。 Then, the robot apparatus 100 drives the actuator to adjust the camera viewpoint in the determined moving direction.
 次に、図20を用いて、カメラ視点の調整処理の流れについて説明する。図20は、本開示のカメラ視点の調整処理の手順を示すフローチャートである。 Next, the flow of adjustment processing of the camera viewpoint will be described with reference to FIG. FIG. 20 is a flowchart showing the procedure of the camera viewpoint adjustment process of the present disclosure.
 図20に示すように、ロボット装置100は、入力を受け付ける(ステップS501)。例えば、ロボット装置100は、画像や質問の入力を受け付ける。 As shown in FIG. 20, the robot device 100 receives an input (step S501). For example, the robot device 100 receives an input of an image or a question.
 そして、ロボット装置100は、特定方向の周辺空間を推定する(ステップS502)。例えば、ロボット装置100は、推定対象となる方向のうち、未検討の方向の周辺空間を推定する。そして、ロボット装置100は、特定方向の識別コンフィデンスを検討する(ステップS503)。例えば、ロボット装置100は、特定方向のスコアが最大かを検討し、検討済みの方向のうち、特定方向のスコアが最大である場合、その特定方向を移動方向の候補に決定する。 Then, the robot apparatus 100 estimates the peripheral space in the specific direction (step S502). For example, the robot apparatus 100 estimates a peripheral space in an unexamined direction among the directions to be estimated. Then, the robot apparatus 100 examines the identification confidence in the specific direction (step S503). For example, the robot apparatus 100 considers whether the score in the specific direction is the maximum, and if the score in the specific direction is the maximum among the examined directions, the specific direction is determined as a candidate for the moving direction.
 そして、ロボット装置100は、全ての方向を検討したかどうかを判定する(ステップS504)。ロボット装置100は、全ての方向を検討していない場合(ステップS504;No)、ステップS502に戻って、未検討の方向が無くなるまで処理を繰り返す。 Then, the robot apparatus 100 determines whether or not all directions have been considered (step S504). If all directions have not been considered (step S504; No), the robot apparatus 100 returns to step S502 and repeats the process until there are no unexamined directions.
 一方、ロボット装置100は、全ての方向を検討した場合(ステップS504;Yes)、ステップS505の処理を行う。この場合、ロボット装置100は、全方向の検討後に、移動方向の候補となっている方向を移動方向に決定する。例えば、ロボット装置100は、全ての方向のスコアが所定の閾値未満でる場合、カメラ視点を調整しないと決定してもよい。 On the other hand, when the robot device 100 considers all the directions (step S504; Yes), the robot device 100 performs the process of step S505. In this case, the robot apparatus 100 determines a moving direction candidate as a moving direction after examining all the directions. For example, the robot apparatus 100 may determine not to adjust the camera viewpoint when the scores in all directions are less than the predetermined threshold.
 ロボット装置100は、動作するかどうかを判定する(ステップS505)。ロボット装置100は、動作しないと判定した場合(ステップS505;No)、カメラ視点を変更せずに、識別を行う(ステップS508)。例えば、ロボット装置100は、全ての方向のスコアが所定の閾値未満である場合、カメラ視点を変更せずに、識別を行う。 The robot apparatus 100 determines whether it operates (step S505). When it is determined that the robot device 100 does not operate (step S505; No), the robot device 100 performs identification without changing the camera viewpoint (step S508). For example, the robot apparatus 100 performs the identification without changing the camera viewpoint when the scores in all directions are less than the predetermined threshold value.
 一方、ロボット装置100は、動作すると判定した場合(ステップS505;Yes)、動作方針を決定する(ステップS506)。例えば、ロボット装置100は、移動方向の候補を移動方向に決定する。ロボット装置100は、スコアが最大である方向にカメラ視点を向けると決定する。 On the other hand, when it is determined that the robot device 100 operates (step S505; Yes), the robot device 100 determines an operation policy (step S506). For example, the robot apparatus 100 determines a moving direction candidate as the moving direction. The robot apparatus 100 determines to point the camera viewpoint in the direction having the maximum score.
 ロボット装置100は、決定に応じて動作する(ステップS507)。例えば、ロボット装置100は、アクチュエータを駆動し、決定した移動方向にカメラ視点を調整する。 The robot apparatus 100 operates according to the determination (step S507). For example, the robot apparatus 100 drives the actuator and adjusts the camera viewpoint in the determined moving direction.
 そして、ロボット装置100は、識別を行う(ステップS508)。例えば、ロボット装置100は、画像及び質問を含むクエリとサポート情報とに基づいて、応答に用いるサポート情報を決定する。 Then, the robot apparatus 100 performs identification (step S508). For example, the robot apparatus 100 determines the support information used for the response based on the query including the image and the question and the support information.
 そして、ロボット装置100は、識別結果に基づく出力を行う(ステップS509)。例えば、ロボット装置100は、画像及び質問を含むクエリに基づいて、決定したサポート情報の正解を出力する。 Then, the robot apparatus 100 outputs based on the identification result (step S509). For example, the robot apparatus 100 outputs the correct answer of the determined support information based on the query including the image and the question.
 また、上記各実施形態において説明した各処理のうち、自動的に行われるものとして説明した処理の全部または一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。例えば、各図に示した各種情報は、図示した情報に限られない。 Further, of the processes described in the above embodiments, all or part of the processes described as being automatically performed may be manually performed, or the processes described as being manually performed. All or part of the above can be automatically performed by a known method. In addition, the processing procedures, specific names, information including various data and parameters shown in the above-mentioned documents and drawings can be arbitrarily changed unless otherwise specified. For example, the various kinds of information shown in each drawing are not limited to the illustrated information.
 また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。 Also, each component of each device shown in the drawings is functionally conceptual, and does not necessarily have to be physically configured as shown. That is, the specific form of distribution / integration of each device is not limited to that shown in the drawings, and all or part of the device may be functionally or physically distributed / arranged in arbitrary units according to various loads or usage conditions. It can be integrated and configured.
 また、上述してきた各実施形態及び変形例は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 Also, the above-described respective embodiments and modified examples can be appropriately combined within a range in which the processing content is not inconsistent.
 また、本明細書に記載された効果はあくまで例示であって限定されるものでは無く、他の効果があってもよい。 Also, the effects described in this specification are merely examples and are not limited, and there may be other effects.
(4.ハードウェア構成)
 上述してきた各実施形態に係るロボット装置100、100Aや情報処理装置100B等の情報機器は、例えば図21に示すような構成のコンピュータ1000によって実現される。図21は、ロボット装置100、100Aや情報処理装置100B等の情報処理装置の機能を実現するコンピュータ1000の一例を示すハードウェア構成図である。以下、第1の実施形態に係るロボット装置100を例に挙げて説明する。コンピュータ1000は、CPU1100、RAM1200、ROM(Read Only Memory)1300、HDD(Hard Disk Drive)1400、通信インターフェイス1500、及び入出力インターフェイス1600を有する。コンピュータ1000の各部は、バス1050によって接続される。
(4. Hardware configuration)
The information devices such as the robot devices 100 and 100A and the information processing device 100B according to the above-described embodiments are realized by, for example, a computer 1000 configured as shown in FIG. FIG. 21 is a hardware configuration diagram showing an example of a computer 1000 that realizes the functions of an information processing apparatus such as the robot apparatuses 100 and 100A and the information processing apparatus 100B. Hereinafter, the robot device 100 according to the first embodiment will be described as an example. The computer 1000 has a CPU 1100, a RAM 1200, a ROM (Read Only Memory) 1300, an HDD (Hard Disk Drive) 1400, a communication interface 1500, and an input / output interface 1600. Each unit of the computer 1000 is connected by a bus 1050.
 CPU1100は、ROM1300又はHDD1400に格納されたプログラムに基づいて動作し、各部の制御を行う。例えば、CPU1100は、ROM1300又はHDD1400に格納されたプログラムをRAM1200に展開し、各種プログラムに対応した処理を実行する。 The CPU 1100 operates based on a program stored in the ROM 1300 or the HDD 1400, and controls each part. For example, the CPU 1100 expands a program stored in the ROM 1300 or the HDD 1400 into the RAM 1200 and executes processing corresponding to various programs.
 ROM1300は、コンピュータ1000の起動時にCPU1100によって実行されるBIOS(Basic Input Output System)等のブートプログラムや、コンピュータ1000のハードウェアに依存するプログラム等を格納する。 The ROM 1300 stores a boot program such as a BIOS (Basic Input Output System) executed by the CPU 1100 when the computer 1000 starts up, a program dependent on the hardware of the computer 1000, and the like.
 HDD1400は、CPU1100によって実行されるプログラム、及び、かかるプログラムによって使用されるデータ等を非一時的に記録する、コンピュータが読み取り可能な記録媒体である。具体的には、HDD1400は、プログラムデータ1450の一例である本開示に係る情報処理プログラムを記録する記録媒体である。 The HDD 1400 is a computer-readable recording medium that non-temporarily records a program executed by the CPU 1100, data used by the program, and the like. Specifically, the HDD 1400 is a recording medium that records the information processing program according to the present disclosure, which is an example of the program data 1450.
 通信インターフェイス1500は、コンピュータ1000が外部ネットワーク1550(例えばインターネット)と接続するためのインターフェイスである。例えば、CPU1100は、通信インターフェイス1500を介して、他の機器からデータを受信したり、CPU1100が生成したデータを他の機器へ送信したりする。 The communication interface 1500 is an interface for connecting the computer 1000 to an external network 1550 (for example, the Internet). For example, the CPU 1100 receives data from another device or transmits the data generated by the CPU 1100 to another device via the communication interface 1500.
 入出力インターフェイス1600は、入出力デバイス1650とコンピュータ1000とを接続するためのインターフェイスである。例えば、CPU1100は、入出力インターフェイス1600を介して、キーボードやマウス等の入力デバイスからデータを受信する。また、CPU1100は、入出力インターフェイス1600を介して、ディスプレイやスピーカーやプリンタ等の出力デバイスにデータを送信する。また、入出力インターフェイス1600は、所定の記録媒体(メディア)に記録されたプログラム等を読み取るメディアインターフェイスとして機能してもよい。メディアとは、例えばDVD(Digital Versatile Disc)、PD(Phase change rewritable Disk)等の光学記録媒体、MO(Magneto-Optical disk)等の光磁気記録媒体、テープ媒体、磁気記録媒体、または半導体メモリ等である。例えば、コンピュータ1000が実施形態に係る情報処理装置100として機能する場合、コンピュータ1000のCPU1100は、RAM1200上にロードされた情報処理プログラムを実行することにより、制御部15等の機能を実現する。また、HDD1400には、本開示に係る情報処理プログラムや、記憶部14内のデータが格納される。なお、CPU1100は、プログラムデータ1450をHDD1400から読み取って実行するが、他の例として、外部ネットワーク1550を介して、他の装置からこれらのプログラムを取得してもよい。 The input / output interface 1600 is an interface for connecting the input / output device 1650 and the computer 1000. For example, the CPU 1100 receives data from an input device such as a keyboard or a mouse via the input / output interface 1600. The CPU 1100 also transmits data to an output device such as a display, a speaker, a printer, etc. via the input / output interface 1600. Also, the input / output interface 1600 may function as a media interface for reading a program or the like recorded in a predetermined recording medium (medium). Examples of media include optical recording media such as DVD (Digital Versatile Disc) and PD (Phase change rewritable Disk), magneto-optical recording media such as MO (Magneto-Optical disk), tape media, magnetic recording media, and semiconductor memory. Is. For example, when the computer 1000 functions as the information processing apparatus 100 according to the embodiment, the CPU 1100 of the computer 1000 realizes the functions of the control unit 15 and the like by executing the information processing program loaded on the RAM 1200. Further, the HDD 1400 stores the information processing program according to the present disclosure and the data in the storage unit 14. Note that the CPU 1100 reads the program data 1450 from the HDD 1400 and executes the program data, but as another example, these programs may be acquired from another device via the external network 1550.
 なお、本技術は以下のような構成も取ることができる。
(1)
 画像、前記画像に関連する質問、及び前記質問に対応する正解を取得する取得部と、
 前記取得部により取得された前記画像、前記質問、及び前記正解の組合せを、一の画像及び前記一の画像に関連する一の質問を含むクエリ情報への応答の決定に用いるサポート情報として登録する登録部と、
 を備える情報処理装置。
(2)
 前記取得部は、
 前記画像、前記画像に対応する概念に関する前記質問、及び前記概念に関する前記正解を取得し、
 前記登録部は、
 前記画像、前記概念に関する前記質問、及び前記概念に関する前記正解の前記組合せを、前記サポート情報として登録する
 前記(1)に記載の情報処理装置。
(3)
 前記取得部は、
 前記画像、前記画像に含まれる対象物に関する前記質問、及び前記対象物に関する前記正解を取得し、
 前記登録部は、
 前記画像、前記対象物に関する前記質問、及び前記対象物に関する前記正解の前記組合せを、前記サポート情報として登録する
 前記(2)に記載の情報処理装置。
(4)
 前記取得部は、
 前記画像、前記画像に含まれる対象物の性質または状態に関する前記質問、及び前記性質または前記状態に関する前記正解を取得し、
 前記登録部は、
 前記画像、前記性質または前記状態に関する前記質問、及び前記性質または前記状態に関する前記正解の前記組合せを、前記サポート情報として登録する
 前記(3)に記載の情報処理装置。
(5)
 前記取得部は、
 前記画像、前記画像に対するユーザの印象に関する前記質問、及び前記印象に関する前記正解を取得し、
 前記登録部は、
 前記画像、前記印象に関する前記質問、及び前記印象に関する前記正解の前記組合せを、前記サポート情報として登録する
 前記(4)に記載の情報処理装置。
(6)
 前記取得部は、
 前記画像、前記画像に含まれる対象物の量、色、温度または硬さに関する前記質問、及び前記量、前記色、前記温度または前記硬さに関する前記正解を取得し、
 前記登録部は、
 前記画像、及び前記量、前記色、前記温度または前記硬さに関する前記質問、及び前記質問、及び前記量、前記色、前記温度または前記硬さに関する前記正解の前記組合せを、前記サポート情報として登録する
 前記(5)に記載の情報処理装置。
(7)
 ユーザによる入力を受け付ける入力部、
 を備え、
 前記取得部は、
 前記ユーザにより入力された前記質問を取得する
 前記(1)~(6)のいずれかに記載の情報処理装置。
(8)
 前記質問に対応する正解候補を出力する出力部、
 を備え、
 前記入力部は、
 前記ユーザによる前記正解候補に対する反応を受け付け、
 前記登録部は、
 前記正解候補に対する前記反応に応じて決定される前記正解を含む前記組合せを、前記サポート情報として登録する
 前記(7)に記載の情報処理装置。
(9)
 前記登録部は、
 前記正解候補に対する前記反応が肯定的である場合、前記正解候補を前記正解として含む前記組合せを、前記サポート情報として登録する
 前記(8)に記載の情報処理装置。
(10)
 前記入力部は、
 前記正解候補に対する前記反応が否定的である場合、前記ユーザによる前記正解候補とは別の正解候補を受け付け、
 前記登録部は、
 前記ユーザにより入力された前記別の正解候補を前記正解として含む前記組合せを、前記サポート情報として登録する
 前記(8)に記載の情報処理装置。
(11)
 前記質問を出力する出力部、
 を備え、
 前記取得部は、
 前記出力部により出力された前記質問を取得する
 前記(1)~(6)のいずれかに記載の情報処理装置。
(12)
 ユーザによる前記質問に対応する前記正解の入力を受け付ける入力部、
 を備え、
 前記登録部は、
 前記ユーザにより入力された前記正解を含む前記組合せを、前記サポート情報として登録する
 前記(11)に記載の情報処理装置。
(13)
 ユーザの有無を検知する検知部、
 を備え、
 前記出力部は、
 前記検知部によりユーザが検知された場合、前記質問を出力する
 前記(11)または(12)に記載の情報処理装置。
(14)
 前記画像を撮像する撮像部、
 を備え、
 前記取得部は、
 前記撮像部により検知された前記画像を取得する
 前記(1)~(13)のいずれかに記載の情報処理装置。
(15)
 前記クエリ情報と、前記サポート情報とに基づいて、前記クエリ情報の前記一の質問に対応する一の正解を決定する決定部、
 を備える前記(1)~(14)のいずれかに記載の情報処理装置。
(16)
 前記決定部は、
 前記クエリ情報に含まれる前記一の画像及び前記一の質問と、前記サポート情報に含まれる前記画像及び前記質問とに基づいて、前記一の正解を決定する
 前記(15)に記載の情報処理装置。
(17)
 前記決定部は、
 前記クエリ情報に含まれる前記一の画像及び前記一の質問と、前記サポート情報に含まれる前記画像及び前記質問との比較に基づいて、前記一の正解を決定する
 前記(16)に記載の情報処理装置。
(18)
 前記取得部は、
 前記クエリ情報を取得し、
 前記決定部は、
 前記取得部により取得された前記クエリ情報と、複数のサポート情報とに基づいて、前記複数のサポート情報のうち、前記一の正解に用いる一のサポート情報を決定する
 前記(15)~(17)のいずれかに記載の情報処理装置。
(19)
 画像、前記画像に関連する質問、及び前記質問に対応する正解を取得し、
 取得した前記画像、前記質問、及び前記正解の組合せを、一の画像及び前記一の画像に関連する一の質問を含むクエリ情報への応答の決定に用いるサポート情報として登録する
 処理を実行する情報処理方法。
(20)
 画像、前記画像に関連する質問、及び前記質問に対応する正解を取得し、
 取得した前記画像、前記質問、及び前記正解の組合せを、一の画像及び前記一の画像に関連する一の質問を含むクエリ情報への応答の決定に用いるサポート情報として登録する
 処理を実行させる情報処理プログラム。
Note that the present technology may also be configured as below.
(1)
An image, a question related to the image, and an acquisition unit that acquires a correct answer corresponding to the question;
The combination of the image, the question, and the correct answer acquired by the acquisition unit is registered as support information used to determine a response to query information including one image and one question related to the one image. Registration department,
An information processing apparatus including.
(2)
The acquisition unit is
Obtaining the image, the question about the concept corresponding to the image, and the correct answer about the concept,
The registration unit is
The information processing apparatus according to (1), wherein the combination of the image, the question regarding the concept, and the correct answer regarding the concept is registered as the support information.
(3)
The acquisition unit is
Acquiring the image, the question about the object contained in the image, and the correct answer about the object,
The registration unit is
The information processing apparatus according to (2), wherein the combination of the image, the question regarding the object, and the correct answer regarding the object is registered as the support information.
(4)
The acquisition unit is
Acquiring the image, the question regarding the property or state of an object included in the image, and the correct answer regarding the property or state,
The registration unit is
The information processing apparatus according to (3), wherein the combination of the image, the question regarding the property or the state, and the correct answer regarding the property or the state is registered as the support information.
(5)
The acquisition unit is
Obtaining the image, the question regarding the user's impression of the image, and the correct answer regarding the impression,
The registration unit is
The information processing apparatus according to (4), wherein the combination of the image, the question regarding the impression, and the correct answer regarding the impression is registered as the support information.
(6)
The acquisition unit is
Acquiring the image, the amount of the object contained in the image, the color, the question about the temperature or hardness, and the amount, the color, the temperature or the correct answer about the hardness,
The registration unit is
The image and the question regarding the amount, the color, the temperature, or the hardness, and the question and the combination of the correct answers regarding the amount, the color, the temperature, or the hardness are registered as the support information. The information processing apparatus according to (5) above.
(7)
An input unit that accepts user input,
Equipped with
The acquisition unit is
The information processing apparatus according to any one of (1) to (6), which acquires the question input by the user.
(8)
An output unit that outputs a correct answer candidate corresponding to the question,
Equipped with
The input unit is
Accepting a response to the correct candidate by the user,
The registration unit is
The information processing device according to (7), wherein the combination including the correct answer determined according to the reaction to the correct answer candidate is registered as the support information.
(9)
The registration unit is
The information processing device according to (8), wherein when the reaction to the correct answer candidate is affirmative, the combination including the correct answer candidate as the correct answer is registered as the support information.
(10)
The input unit is
When the reaction to the correct answer candidate is negative, a correct answer candidate different from the correct answer candidate by the user is accepted,
The registration unit is
The information processing apparatus according to (8), wherein the combination including the another correct answer candidate input by the user as the correct answer is registered as the support information.
(11)
An output unit that outputs the question,
Equipped with
The acquisition unit is
The information processing apparatus according to any one of (1) to (6), wherein the question output by the output unit is acquired.
(12)
An input unit that receives an input of the correct answer corresponding to the question by the user,
Equipped with
The registration unit is
The information processing apparatus according to (11), wherein the combination including the correct answer input by the user is registered as the support information.
(13)
A detection unit that detects the presence or absence of a user,
Equipped with
The output unit is
The information processing device according to (11) or (12), which outputs the question when the user is detected by the detection unit.
(14)
An image capturing unit that captures the image,
Equipped with
The acquisition unit is
The information processing apparatus according to any one of (1) to (13), which acquires the image detected by the imaging unit.
(15)
A determination unit that determines one correct answer corresponding to the one question of the query information, based on the query information and the support information,
The information processing apparatus according to any one of (1) to (14) above, including:
(16)
The determination unit is
The information processing device according to (15), wherein the one correct answer is determined based on the one image and the one question included in the query information, and the image and the question included in the support information. ..
(17)
The determination unit is
The information according to (16), wherein the one correct answer is determined based on a comparison between the one image and the one question included in the query information and the image and the question included in the support information. Processing equipment.
(18)
The acquisition unit is
Obtaining the query information,
The determination unit is
Based on the query information acquired by the acquisition unit and a plurality of support information, one of the plurality of support information to be used for the one correct answer is determined (15) to (17) The information processing device according to any one of 1.
(19)
Obtaining an image, a question associated with the image, and a correct answer corresponding to the question,
Registering the acquired combination of the image, the question, and the correct answer as support information used for determining the response to the query information including the one image and the one question related to the one image. Information for executing a process. Processing method.
(20)
Obtaining an image, a question associated with the image, and a correct answer corresponding to the question,
Registering the acquired combination of the image, the question, and the correct answer as support information used for determining a response to query information including one image and one question related to the one image Information for executing processing Processing program.
 100、100A ロボット装置
 100B 情報処理装置
 11、11B 通信部
 12 入力部(マイク)
 13 出力部(スピーカー)
 14、14A、14B 記憶部
 141 サポート情報記憶部
 142 モデル情報記憶部
 143、143A モード情報記憶部
 144 質問情報記憶部
 15、15A 制御部
 151 取得部
 152 判定部
 153、153A 生成部
 154 登録部
 155 学習部
 156 決定部
 16、16A センサ部(撮像部、検知部、カメラ)
 17 駆動部(アクチュエータ)
100, 100A robot device 100B information processing device 11, 11B communication unit 12 input unit (microphone)
13 Output section (speaker)
14, 14A, 14B storage unit 141 support information storage unit 142 model information storage unit 143, 143A mode information storage unit 144 question information storage unit 15, 15A control unit 151 acquisition unit 152 determination unit 153, 153A generation unit 154 registration unit 155 learning Part 156 Deciding part 16, 16A Sensor part (imaging part, detecting part, camera)
17 Drive unit (actuator)

Claims (20)

  1.  画像、前記画像に関連する質問、及び前記質問に対応する正解を取得する取得部と、
     前記取得部により取得された前記画像、前記質問、及び前記正解の組合せを、一の画像及び前記一の画像に関連する一の質問を含むクエリ情報への応答の決定に用いるサポート情報として登録する登録部と、
     を備える情報処理装置。
    An image, a question related to the image, and an acquisition unit that acquires a correct answer corresponding to the question;
    The combination of the image, the question, and the correct answer acquired by the acquisition unit is registered as support information used for determining a response to query information including one image and one question related to the one image. Registration department,
    An information processing apparatus including.
  2.  前記取得部は、
     前記画像、前記画像に対応する概念に関する前記質問、及び前記概念に関する前記正解を取得し、
     前記登録部は、
     前記画像、前記概念に関する前記質問、及び前記概念に関する前記正解の前記組合せを、前記サポート情報として登録する
     請求項1に記載の情報処理装置。
    The acquisition unit is
    Acquiring the image, the question regarding the concept corresponding to the image, and the correct answer regarding the concept,
    The registration unit is
    The information processing apparatus according to claim 1, wherein the combination of the image, the question regarding the concept, and the correct answer regarding the concept is registered as the support information.
  3.  前記取得部は、
     前記画像、前記画像に含まれる対象物に関する前記質問、及び前記対象物に関する前記正解を取得し、
     前記登録部は、
     前記画像、前記対象物に関する前記質問、及び前記対象物に関する前記正解の前記組合せを、前記サポート情報として登録する
     請求項2に記載の情報処理装置。
    The acquisition unit is
    Acquiring the image, the question about the object contained in the image, and the correct answer about the object,
    The registration unit is
    The information processing apparatus according to claim 2, wherein the combination of the image, the question regarding the object, and the correct answer regarding the object is registered as the support information.
  4.  前記取得部は、
     前記画像、前記画像に含まれる対象物の性質または状態に関する前記質問、及び前記性質または前記状態に関する前記正解を取得し、
     前記登録部は、
     前記画像、前記性質または前記状態に関する前記質問、及び前記性質または前記状態に関する前記正解の前記組合せを、前記サポート情報として登録する
     請求項3に記載の情報処理装置。
    The acquisition unit is
    Acquiring the image, the question regarding the property or state of an object included in the image, and the correct answer regarding the property or state,
    The registration unit is
    The information processing device according to claim 3, wherein the combination of the image, the question regarding the property or the state, and the correct answer regarding the property or the state is registered as the support information.
  5.  前記取得部は、
     前記画像、前記画像に対するユーザの印象に関する前記質問、及び前記印象に関する前記正解を取得し、
     前記登録部は、
     前記画像、前記印象に関する前記質問、及び前記印象に関する前記正解の前記組合せを、前記サポート情報として登録する
     請求項4に記載の情報処理装置。
    The acquisition unit is
    Obtaining the image, the question about the user's impression of the image, and the correct answer about the impression,
    The registration unit is
    The information processing apparatus according to claim 4, wherein the combination of the image, the question regarding the impression, and the correct answer regarding the impression is registered as the support information.
  6.  前記取得部は、
     前記画像、前記画像に含まれる対象物の量、色、温度または硬さに関する前記質問、及び前記量、前記色、前記温度または前記硬さに関する前記正解を取得し、
     前記登録部は、
     前記画像、及び前記量、前記色、前記温度または前記硬さに関する前記質問、及び前記質問、及び前記量、前記色、前記温度または前記硬さに関する前記正解の前記組合せを、前記サポート情報として登録する
     請求項5に記載の情報処理装置。
    The acquisition unit is
    Acquiring the image, the amount of the object contained in the image, the color, the question about the temperature or hardness, and the amount, the color, the temperature or the correct answer about the hardness,
    The registration unit is
    The image and the question regarding the amount, the color, the temperature, or the hardness, and the question, and the combination of the correct answers regarding the amount, the color, the temperature, or the hardness are registered as the support information. The information processing apparatus according to claim 5.
  7.  ユーザによる入力を受け付ける入力部、
     を備え、
     前記取得部は、
     前記ユーザにより入力された前記質問を取得する
     請求項1に記載の情報処理装置。
    An input unit that accepts user input,
    Equipped with
    The acquisition unit is
    The information processing apparatus according to claim 1, wherein the question input by the user is acquired.
  8.  前記質問に対応する正解候補を出力する出力部、
     を備え、
     前記入力部は、
     前記ユーザによる前記正解候補に対する反応を受け付け、
     前記登録部は、
     前記正解候補に対する前記反応に応じて決定される前記正解を含む前記組合せを、前記サポート情報として登録する
     請求項7に記載の情報処理装置。
    An output unit that outputs a correct answer candidate corresponding to the question,
    Equipped with
    The input unit is
    Accepting a response to the correct candidate by the user,
    The registration unit is
    The information processing apparatus according to claim 7, wherein the combination including the correct answer determined according to the reaction to the correct answer candidate is registered as the support information.
  9.  前記登録部は、
     前記正解候補に対する前記反応が肯定的である場合、前記正解候補を前記正解として含む前記組合せを、前記サポート情報として登録する
     請求項8に記載の情報処理装置。
    The registration unit is
    The information processing apparatus according to claim 8, wherein when the reaction to the correct answer candidate is affirmative, the combination including the correct answer candidate as the correct answer is registered as the support information.
  10.  前記入力部は、
     前記正解候補に対する前記反応が否定的である場合、前記ユーザによる前記正解候補とは別の正解候補を受け付け、
     前記登録部は、
     前記ユーザにより入力された前記別の正解候補を前記正解として含む前記組合せを、前記サポート情報として登録する
     請求項8に記載の情報処理装置。
    The input unit is
    When the reaction to the correct answer candidate is negative, a correct answer candidate different from the correct answer candidate by the user is accepted,
    The registration unit is
    The information processing apparatus according to claim 8, wherein the combination including the another correct answer candidate input by the user as the correct answer is registered as the support information.
  11.  前記質問を出力する出力部、
     を備え、
     前記取得部は、
     前記出力部により出力された前記質問を取得する
     請求項1に記載の情報処理装置。
    An output unit that outputs the question,
    Equipped with
    The acquisition unit is
    The information processing apparatus according to claim 1, wherein the question output by the output unit is acquired.
  12.  ユーザによる前記質問に対応する前記正解の入力を受け付ける入力部、
     を備え、
     前記登録部は、
     前記ユーザにより入力された前記正解を含む前記組合せを、前記サポート情報として登録する
     請求項11に記載の情報処理装置。
    An input unit that receives an input of the correct answer corresponding to the question by the user,
    Equipped with
    The registration unit is
    The information processing apparatus according to claim 11, wherein the combination including the correct answer input by the user is registered as the support information.
  13.  ユーザの有無を検知する検知部、
     を備え、
     前記出力部は、
     前記検知部によりユーザが検知された場合、前記質問を出力する
     請求項11に記載の情報処理装置。
    A detection unit that detects the presence or absence of a user,
    Equipped with
    The output unit is
    The information processing apparatus according to claim 11, wherein when the user is detected by the detection unit, the question is output.
  14.  前記画像を撮像する撮像部、
     を備え、
     前記取得部は、
     前記撮像部により検知された前記画像を取得する
     請求項1に記載の情報処理装置。
    An image capturing unit that captures the image,
    Equipped with
    The acquisition unit is
    The information processing apparatus according to claim 1, wherein the image detected by the imaging unit is acquired.
  15.  前記クエリ情報と、前記サポート情報とに基づいて、前記クエリ情報の前記一の質問に対応する一の正解を決定する決定部、
     を備える請求項1に記載の情報処理装置。
    A determination unit that determines one correct answer corresponding to the one question of the query information, based on the query information and the support information,
    The information processing apparatus according to claim 1, further comprising:
  16.  前記決定部は、
     前記クエリ情報に含まれる前記一の画像及び前記一の質問と、前記サポート情報に含まれる前記画像及び前記質問とに基づいて、前記一の正解を決定する
     請求項15に記載の情報処理装置。
    The determination unit is
    The information processing apparatus according to claim 15, wherein the one correct answer is determined based on the one image and the one question included in the query information, and the image and the question included in the support information.
  17.  前記決定部は、
     前記クエリ情報に含まれる前記一の画像及び前記一の質問と、前記サポート情報に含まれる前記画像及び前記質問との比較に基づいて、前記一の正解を決定する
     請求項16に記載の情報処理装置。
    The determination unit is
    The information processing according to claim 16, wherein the one correct answer is determined based on a comparison between the one image and the one question included in the query information and the image and the question included in the support information. apparatus.
  18.  前記取得部は、
     前記クエリ情報を取得し、
     前記決定部は、
     前記取得部により取得された前記クエリ情報と、複数のサポート情報とに基づいて、前記複数のサポート情報のうち、前記一の正解に用いる一のサポート情報を決定する
     請求項15に記載の情報処理装置。
    The acquisition unit is
    Obtaining the query information,
    The determination unit is
    The information processing according to claim 15, wherein among the plurality of support information, one support information to be used for the one correct answer is determined based on the query information acquired by the acquisition unit and a plurality of support information. apparatus.
  19.  画像、前記画像に関連する質問、及び前記質問に対応する正解を取得し、
     取得した前記画像、前記質問、及び前記正解の組合せを、一の画像及び前記一の画像に関連する一の質問を含むクエリ情報への応答の決定に用いるサポート情報として登録する
     処理を実行する情報処理方法。
    Obtaining an image, a question associated with the image, and a correct answer corresponding to the question,
    Registering the acquired combination of the image, the question, and the correct answer as support information used for determining a response to query information including one image and one question related to the one image Information for executing processing Processing method.
  20.  画像、前記画像に関連する質問、及び前記質問に対応する正解を取得し、
     取得した前記画像、前記質問、及び前記正解の組合せを、一の画像及び前記一の画像に関連する一の質問を含むクエリ情報への応答の決定に用いるサポート情報として登録する
     処理を実行させる情報処理プログラム。
    Obtaining an image, a question associated with the image, and a correct answer corresponding to the question,
    Registering the acquired combination of the image, the question, and the correct answer as support information used for determining a response to query information including one image and one question related to the one image Information for executing a process Processing program.
PCT/JP2019/041218 2018-11-14 2019-10-18 Information processing device, information processing method, and information processing program WO2020100532A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2018214174 2018-11-14
JP2018-214174 2018-11-14

Publications (1)

Publication Number Publication Date
WO2020100532A1 true WO2020100532A1 (en) 2020-05-22

Family

ID=70730922

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2019/041218 WO2020100532A1 (en) 2018-11-14 2019-10-18 Information processing device, information processing method, and information processing program

Country Status (1)

Country Link
WO (1) WO2020100532A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7481995B2 (en) 2020-10-28 2024-05-13 株式会社東芝 State determination device, method, and program

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004013344A (en) * 2002-06-04 2004-01-15 Hitachi Information Systems Ltd Question/answer support system for screen operation
JP2005302000A (en) * 2004-03-18 2005-10-27 Yafoo Japan Corp Device, method and program for retrieving knowledge
JP2008117324A (en) * 2006-11-08 2008-05-22 Nec Corp Verification system, method therefor, data processing terminal using the same, and program
JP2017182646A (en) * 2016-03-31 2017-10-05 大日本印刷株式会社 Information processing device, program and information processing method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004013344A (en) * 2002-06-04 2004-01-15 Hitachi Information Systems Ltd Question/answer support system for screen operation
JP2005302000A (en) * 2004-03-18 2005-10-27 Yafoo Japan Corp Device, method and program for retrieving knowledge
JP2008117324A (en) * 2006-11-08 2008-05-22 Nec Corp Verification system, method therefor, data processing terminal using the same, and program
JP2017182646A (en) * 2016-03-31 2017-10-05 大日本印刷株式会社 Information processing device, program and information processing method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7481995B2 (en) 2020-10-28 2024-05-13 株式会社東芝 State determination device, method, and program

Similar Documents

Publication Publication Date Title
US11241789B2 (en) Data processing method for care-giving robot and apparatus
CN109176535B (en) Interaction method and system based on intelligent robot
CN107097234B (en) Robot control system
CN110069707A (en) Artificial intelligence self-adaptation interactive teaching system
JP2019056970A (en) Information processing device, artificial intelligence selection method and artificial intelligence selection program
CN108781273A (en) Action based on automatic participant mark
WO2016136104A1 (en) Information processing device, information processing method, and program
CN109770918A (en) Mood analytical equipment, method and the machine readable storage medium for recording this method program
WO2020100532A1 (en) Information processing device, information processing method, and information processing program
US11238846B2 (en) Information processing device and information processing method
WO2019207875A1 (en) Information processing device, information processing method, and program
US20200234187A1 (en) Information processing apparatus, information processing method, and program
JP7371770B2 (en) Avatar control program, avatar control method, and information processing device
CN114428879A (en) Multimode English teaching system based on multi-scene interaction
WO2017149848A1 (en) Information processing device, information processing method and program
US11935449B2 (en) Information processing apparatus and information processing method
JP7087804B2 (en) Communication support device, communication support system and communication method
Vidrin Partnering as rhetoric
WO2019146199A1 (en) Information processing device and information processing method
JP6866731B2 (en) Speech recognition device, speech recognition method, and program
JP4685712B2 (en) Speaker face image determination method, apparatus and program
US20220261201A1 (en) Computer-readable recording medium storing display control program, display control device, and display control method
JP2020187714A (en) Information processor, information processing system, learning device, and information processing program
CN112585662A (en) Method and system for automatically sharing process knowledge
KR102439446B1 (en) Learning management system based on artificial intelligence

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19885156

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19885156

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP