WO2015104883A1

WO2015104883A1 - Information processing device, information processing method, and program

Info

Publication number: WO2015104883A1
Application number: PCT/JP2014/078111
Authority: WO
Inventors: 淳己大村; 道成河野; 麗子桐原; 智朝川; 伊藤　洋子
Original assignee: ソニー株式会社
Priority date: 2014-01-09
Filing date: 2014-10-22
Publication date: 2015-07-16
Also published as: JP2015132878A

Abstract

In order to implement more flexible responses to user actions, this invention provides an information processing device containing a processor that is configured so as to: acquire data that represents a user action; compute, on the basis of the acquired data, an expected value for the level of attention that will be paid to information outputted to the user; and provide said expected value for the purposes of controlling the output of said information.

Description

Information processing apparatus, information processing method, and program

This disclosure relates to an information processing apparatus, an information processing method, and a program.

Various technologies for optimizing the presentation of information from the system to the user have been proposed so far. For example, in Patent Document 1, a morphological analysis is performed on each of a candidate indicating silence from a predetermined number of generated utterance content candidates and each of the generated predetermined number of utterance content candidates, and independent from each of the candidates. When a word is extracted and there is a candidate indicating silence or a candidate that does not include an independent word in the generated candidate utterance contents, the input utterance is ignored, and the display device and the speaker A technique for controlling so that the response content is not responded is described. As a result, a more appropriate response can be taken when the input voice is rejected.

Patent Document 2 discloses a method and apparatus for quickly and accurately managing a dialogue between a human and an agent using voice information, facial expression information, and delay time information, and a voice dialogue system using the method and apparatus. Are listed. More specifically, in the voice dialogue system, the step of generating the first dialogue order information using the dialogue information analyzed from the voice uttered by the user, and the expression information analyzed from the user's face image are used. Generating the second interaction order information, and using the first interaction order information, the second interaction order information, the system state information, the presence / absence of voice input by the user, and the non-response time of the user, Determining the order.

JP 2010-151941 A JP 2004-206704 A

In the technique described in Patent Document 1 above, the presence / absence of a system response is controlled based on the content of the user's utterance. In the technique described in Patent Literature 2, it is determined whether or not the system speaks according to the user's speech, facial expression, and delay time. With such a technique, for example, the presentation of information to the user by voice can be optimized to some extent. However, the user's utterances and facial expressions are only fragmentary materials for estimating whether or not the information presented by the system at that time is appropriate for the user. There was still room for improvement in optimization.

Therefore, the present disclosure proposes a new and improved information processing apparatus, information processing method, and program capable of realizing a more flexible response to user actions.

According to the present disclosure, data indicating a user's action is acquired, and based on the acquired data, an expected value of attention directed to information output to the user is calculated, and the expected value is calculated. An information processing apparatus including a processor configured to provide for output control of the information is provided.

Further, according to the present disclosure, the processor acquires data indicating a user's action, and calculates an expected value of attention directed to information output to the user based on the acquired data. An information processing method including providing the expected value for output control of the information is provided.

In addition, according to the present disclosure, data indicating a user's action is acquired, and based on the acquired data, an expected value of attention directed to information output to the user is calculated, and the expectation A program is provided for causing a computer to realize a function of providing a value for output control of the information.

As described above, according to the present disclosure, a more flexible response to a user action can be realized.

Note that the above effects are not necessarily limited, and any of the effects shown in the present specification, or other effects that can be grasped from the present specification, together with or in place of the above effects. May be played.

1 is a block diagram illustrating a schematic functional configuration of a system according to an embodiment of the present disclosure. It is a figure showing an example of device composition of a system concerning one embodiment of this indication. It is a figure which shows the example of calculation rule DB which concerns on one Embodiment of this indication. 14 is a flowchart illustrating an example of selection of a display method according to an embodiment of the present disclosure. 5 is a flowchart illustrating an example of selection of an output method according to an embodiment of the present disclosure. It is a figure which shows the example of FIG. 5 more concretely. 5 is a diagram illustrating an example of display color selection according to an embodiment of the present disclosure. FIG. FIG. 6 is a diagram for describing a first example of information stock according to an embodiment of the present disclosure. FIG. 9 is a diagram for describing a second example of information stock according to an embodiment of the present disclosure. It is a flowchart which shows the process for the stock of the information which concerns on one Embodiment of this embodiment. FIG. 3 is a block diagram illustrating a hardware configuration example of an information processing apparatus according to an embodiment of the present disclosure.

Hereinafter, preferred embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. In the present specification and drawings, components having substantially the same functional configuration are denoted by the same reference numerals, and redundant description is omitted.

The description will be made in the following order.
1. System configuration 2. Examples of calculation rules Example of output control 3-1. Example of changing display method 3-2. Example of changing output method 3-3. Other examples 4. Examples of information stock Hardware configuration Supplement

(1. System configuration)
FIG. 1 is a block diagram illustrating a schematic functional configuration of a system according to an embodiment of the present disclosure. Referring to FIG. 1, a system 10 includes a camera 101, a sensor 103, a microphone 105, an action data acquisition unit 107, an action DB 109, an action recognition server 111, an attention expectation value calculation unit, and an output control unit. 115, an information generation unit 117, an information cache DB 119, an information server 121, a display 123, a speaker 125, other output devices 127, a feedback analysis unit 129, and a calculation rule DB 131.

System 10 is used to present information to the user. For example, the system 10 may be for continuously presenting information to the same user via a terminal device worn or carried by the user. Alternatively, the system 10 may be for presenting information to an unspecified user (a part of which may be specified) present in the vicinity thereof via a stationary terminal device. .

The camera 101 can shoot a user of the system 10. The camera 101 can acquire image data indicating a user action. The sensor 103 is various sensors that target the user. For example, the sensor 103 includes an acceleration sensor, a gyro sensor, a geomagnetic sensor, a GPS receiver, and the like mounted on a terminal device worn or carried by the user. For example, the sensor 103 may include an ultrasonic sensor or an infrared sensor. The sensor 103 can acquire sensor data indicating a user action. The microphone 105 can pick up sound generated in the vicinity of the user. The microphone 105 acquires audio data indicating a user action. The microphone 105 may constitute a microphone array so that the direction of a sound source can be specified based on audio data. Data acquired by some or all of the camera 101, sensor 103, and microphone 105 is provided to the action data acquisition unit 107.

Furthermore, the action data acquisition unit 107 may acquire user action recognition information from the action recognition server 111. The action recognition server 111 may be included in the system 10 or may be a service outside the system 10. The behavior recognition server 111 recognizes the user's behavior based on various behavior recognition technologies, for example. For example, data acquired by the sensor 103, the camera 101, and the microphone 105 of the system 10 can be used for the action recognition. In this case, the data is separately transmitted from the sensor 103, the camera 101, and the microphone 105 to the action recognition server 111 (not shown). Alternatively, the behavior recognition server 111 may recognize the user's behavior based on sensor data acquired by a terminal device or the like outside the system 10.

The action data acquisition unit 107 is a data acquisition / management function realized by a processor of the information processing apparatus, for example, and acquires / manages various data indicating user actions. As described above, user actions can be provided from the camera 101, the sensor 103, the microphone 105, and / or the behavior recognition server 111. Furthermore, the action data acquisition unit 107 may acquire information on an operation performed by a user in a terminal device included in the system 10 or other terminal devices. In this case, for example, a keyword for information search using a web browser, content information used by the user, and the like can be acquired. The action data acquisition unit 107 stores the provided data (hereinafter also referred to as action data) in the action DB 109 as necessary, and then provides the data to the expected attention value calculation unit 113. Further, the action data acquisition unit 107 may provide action data to the feedback analysis unit 129.

Here, the user action indicated by the action data will be described. For example, the user action may include a user motion or facial expression. In this case, for example, the expected attention value calculation unit 113 may calculate the expected value of the user's attention directed to the information to be output based on the latest or latest action data. The user action may include a reaction to information output from the display 123 or the speaker 125. In this case, based on the action data indicating the reaction, the expected attention value calculation unit 113 calculates the expected value of the user's attention when the information is next output, or the feedback analysis unit 129 calculates the expected value. To verify the validity of.

The action DB 109 is a database realized by, for example, a memory or a storage of an information processing apparatus, and stores action data acquired by the action data acquisition unit 107 temporarily or continuously. For example, the action data acquired by the action data acquisition unit 107 may be provided to the expected attention value calculation unit 113 after being temporarily stored in the action DB 109 or without being stored in the action DB 109. In this case, the attention expectation value calculation unit 113 calculates the expectation value based on the latest action of the user. Alternatively, the action data acquired by the action data acquisition unit 107 may be continuously stored in the action DB 109. In this case, the action data acquisition unit 107 reads out data for a necessary period from the action data stored in the action DB 109 and provides the data to the expected attention value calculation unit 113. In this case, the expected attention value calculator 113 calculates the expected value based on the history of the user's action.

The attention expectation value calculation unit 113 is an arithmetic function realized by, for example, the processor of the information processing apparatus, and is directed to information output to the user based on the action data acquired by the action data acquisition unit 107. Calculate the expected value of attention. Here, attention means the degree of user attention directed to the output information. The expected attention value calculator 113 provides the calculated expected value of attention to the output controller 115 for output control of information.

Here, the attention directed to the output information can vary depending on the user's situation. For example, when the user turns his gaze toward a terminal device including the display 123 or calls to the terminal device, it is expected that much attention will be paid to the output information. On the other hand, when a user is in conversation with another user or is on a train, there is a high possibility that attention will not be paid much to the information output by voice (not heard). The expected attention value calculation unit 113 estimates the user's situation based on action data provided from the camera 101, the sensor 103, the microphone 105, the behavior recognition server 111, and the like, and expects the attention level according to the situation. A value can be calculated. The attention value expected value calculation unit 113 may calculate the expected value by referring to the data of the calculation rule DB 131 that associates the user's action with the expected value of attention.

Also, the attention directed to the output information may vary depending on the content of the output information. For example, if a keyword in which the user is interested is displayed on the display 123 or output from the speaker 125, the user is expected to pay much attention to the output information. The attention expectation value calculation unit 113 acquires the content of the information to be output from the information generation unit 117, and is estimated from the content of the information and action data, for example, a keyword for information retrieval executed in the past by the user. The user's interest is compared, and if there is a match or something in common, the expected value of attention can be raised. The attention expected value calculation unit 113 may calculate the expected value by referring to the data of the calculation rule DB 131 in which the action related to the information content and the expected value of attention are associated with each other.

Note that the expected attention value calculation unit 113 may use only the action data indicating the latest action of the user when calculating the expected value of the user's attention based on the action data as described above. Action data indicating a history of user actions may be used. When the expected value is calculated based on the latest action, the amount of calculation can be reduced, and the expected value can be calculated quickly with a low load. On the other hand, when the expected value is calculated based on the action history, the expected value can be calculated based on the context of the user's action, and the accuracy of the expected value is improved. For example, when the expected value is calculated based on the action history, the expected value of the attention calculated by the attention expected value calculation unit 113 may be different even if the latest action of the user is the same.

Furthermore, when calculating the expected value of the user's attention based on the action data, the expected attention value calculation unit 113 may correct the attention based on the accuracy of the user's action estimation based on the action data. . The action data includes, for example, user image data provided by the camera 101, user sensing data provided by the sensor 103, and voice data in the vicinity of the user provided by the microphone 105. The attention expectation value calculation unit 113 estimates the user's action based on these data, but the accuracy of the estimation may vary from time to time.

The output control unit 115 is an arithmetic function realized by, for example, the processor of the information processing apparatus, and controls output of information to the user based on the expected value of the attention level of the user calculated by the expected attention level calculation unit 113. To do. More specifically, the output control unit 115 may determine whether or not to output information based on an expected value of the user's attention. For example, for some information generated by the information generation unit 117, the output control unit 115 refers to the expected value calculated by the attention expectation value calculation unit 113, and outputs the information when the expected value falls below the threshold value. It may be suppressed, and if not, information output may be executed. The output control unit 115 may select an information output method based on the expected value of the user's attention. For example, when the information generated by the information generation unit 117 may be output as an image via the display 123 or as audio via the speaker 125, the output control unit 115 may With reference to the expected value calculated by the force expectation value calculation unit 113, information may be output as an image when the expected value is below the threshold value, and information may be output as a sound when the expected value is not.

Furthermore, the output control unit 115 may output information that was not output because the expected value of the user's attention is low when the expected value is high. As described above, the output control unit 115 suppresses the output of information when the expected value of the user's attention calculated by the expected attention value calculation unit 113 is below a threshold value, for example. At this time, the user may be in a busy state temporarily, for example, and may wish to be provided with the suppressed information a little later. In such a case, the output control unit 115 may acquire the information temporarily stored in the information cache DB 119 by the information generation unit 117 as a result of suppressing the output first, and output the information to the user.

The information generation unit 117 is an arithmetic function realized by, for example, the processor of the information processing apparatus. For example, based on information provided from the information server 121, information for outputting to the user via the output control unit 115 Is generated. The information server 121 may be included in the system 10 or may be a service external to the system 10. For example, the information server 121 cooperates with the behavior recognition server 111 to generate information for supporting the user's behavior (such as spot information in the vicinity of the user's current location, user's schedule information, or traffic information). Push transmission to the unit 117. In addition, for example, the information server 121 may push-transmit a notification such as an incoming message to the user or a delivery of new information to the information generation unit 117 in cooperation with another service provided in the terminal device. Alternatively, the information server 121 may transmit the information and notification as described above in response to a request that the information generation unit 117 automatically transmits (regardless of a user operation).

As described above, the information generated by the information generation unit 117 is output to the user from the display 123, the speaker 125, and / or the other output device 127 according to the control of the output control unit 115. Therefore, the information generation unit 117 generates image data to be displayed on the display 123, audio data to be output by the speaker 125, and / or a control signal for operating the other output device 127. The information generation unit 117 may receive the data or signal as described above from the information server 121 and output the data or signal as it is, or generates the data or signal as described above based on the information received from the information server 121. May be. When the output control unit 115 controls the information output method based on the expected value of the user's attention, the information generation unit 117 can generate information according to the selected output method. For example, when the output control unit 115 determines to output information via the display 123, the information generation unit 117 can generate image data. For example, when the output control unit 115 determines to output information through the speaker 125, the information generation unit 117 can generate audio data.

In addition, when the output control unit 115 determines to suppress the output of information based on the expected value of the user's attention, the information generation unit 117 temporarily stores the generated information in the information cache DB 119. The information cache DB 119 is a database realized by, for example, the memory or storage of the information processing apparatus, and temporarily stores information generated by the information generation unit 117. In the information cache DB 119, information that has not been output as described above is stored, and the output information may also be stored for a predetermined period, for example, when a re-output is requested by the user. Further, as described above, the calculation of the expected value of the user's attention by the attention expected value calculation unit 113 may be performed based on the content of the information to be output. The generated information or information indicating the contents thereof may be provided to the attention expectation value calculation unit 113 prior to output.

Display 123 displays an image for the user of system 10. The speaker 125 outputs sound toward the user. In addition, the output device 127 can include, for example, an illumination and a vibrator as described later. As described above, the output control unit 115 controls the output of information via these output devices based on the expected value of the user's attention calculated by the expected attention value calculation unit 113. Here, for example, the camera 101, the sensor 103, and / or the microphone 105 (hereinafter also collectively referred to as an input device) includes a display 123, a speaker 125, and / or another output device 127 (hereinafter collectively referred to as an output device). In order to calculate the expected value of the user's attention directed to the information output from the above, data indicating the user's action is acquired. Therefore, it is desirable that the input device and the output device are provided in, for example, the same terminal device or a terminal device whose positional relationship is fixed.

The feedback analysis unit 129 is an arithmetic function realized by, for example, the processor of the information processing apparatus, and analyzes the user action as feedback for the control of the output control unit 115 related to information output. For example, when output of information is executed in accordance with the control of the output control unit 115 based on the expected value calculated by the expected attention value calculation unit 113, output control is appropriately performed from data indicating a user reaction to the information. Can be guessed. For example, when the user completely ignores the output information, it is estimated that the actual user's attention is lower than the calculated expected value. The feedback analysis unit 129 may modify the calculation rule stored in the calculation rule DB 131 based on the analysis result. Further, the feedback analysis unit 129 may correct parameters and the like used in the calculation process in the expected attention value calculation unit 113 based on the analysis result.

The calculation rule DB 131 is a database realized by, for example, a memory or storage of an information processing apparatus, and stores data that associates a user action with an expected value of attention. The data of the calculation rule DB 131 may be prepared in advance, for example. Further, the data may be corrected based on the result of the analysis by the feedback analysis unit 129. When such corrections are repeated, it can be said that the data of the calculation rule DB 131 is formed by learning based on the user's reaction to the output information. For example, the expected attention value calculation unit 113 refers to the calculation rule DB 131 based on the user action indicated by the action data acquired by the action data acquisition unit 107, and acquires a score indicating the expected value of attention. At this time, the expected attention value calculator 113 refers to the calculation rule DB 131 based on the actions of a plurality of users, calculates the expected value of attention by weighting and adding the obtained scores. May be. For example, a plurality of actions may be detected redundantly based on different input data, such as “on the train” and “in conversation with another user”.

FIG. 2 is a diagram illustrating a device configuration example of a system according to an embodiment of the present disclosure. With reference to FIG. 2, the system 10 may include a terminal device 151 and a server 152. Further, the server 152 may include a plurality of servers like the

servers

152a and 152b in the illustrated example.

The terminal device 151 has, for example, a function of outputting information to the user, a function of acquiring data indicating the user's action, and a function of exchanging information and data with the server 152. The terminal device 151 can be, for example, a smartphone, a wearable terminal, a tablet terminal, a personal computer, a television, a game machine, or the like. The terminal device 151 may be carried by a specific user or may be a stationary type used by an unspecified user. The terminal device 151 is realized by, for example, a hardware configuration of an information processing device as will be described later.

The server 152 has, for example, a function of processing data received from the terminal device 151 and a function of transmitting information to be output in the terminal device 151. The server 152 is realized by one or a plurality of server devices on a network, for example. Each server device is realized by a hardware configuration of an information processing device as will be described later.

For example, among the functional configurations shown in FIG. 1, the camera 101, sensor 103, microphone 105, display 123, speaker 125, and other output device 127 are realized in the terminal device 151, and other functional configurations are realized in the server 152. Is done. At this time, for example, the action data acquisition unit 107 is realized in the server 152b, and the expected attention value calculation unit 113 and the output control unit 115 are realized in the server 152a. May be realized.

In another example, among the functional configurations shown in FIG. 1, the action recognition server 111 and the information server 121 may be realized in the server 152, and other functional configurations may be realized in the terminal device 151. As described above, in the system 10, the information processing apparatus according to the present embodiment, for example, the information processing apparatus that implements the expected attention value calculation unit 113 and the output control unit 115 may be the terminal device 151. There may be. As described above, the server 152 may include a plurality of servers (not limited to two as in the illustrated example, but may be three or more).

Further, for example, when the action data acquisition unit 107 does not acquire data from the action recognition server 111 and the information generation unit 117 does not acquire data from the information server 121, the system as shown in FIG. The entire ten functional configurations (excluding the above two servers) may be realized in the terminal device 151. In this case, the system 10 may not include the server 152.

(2. Examples of calculation rules)
FIG. 3 is a diagram illustrating an example of a calculation rule DB according to an embodiment of the present disclosure. FIG. 3 shows records 131a to 131e associating actions, attention scores, sources, and conditions as examples of data stored in the calculation rule DB.

The action is a user action specified when the action data acquired by the action data acquiring unit 107 satisfies a predetermined condition. The attention score is a score corresponding to an expected value of attention directed to information output to the user when each action is specified. The source provides action data for identifying each action. A condition is a condition that action data provided by a source should satisfy in order to identify each action.

For example, the record 131a is specified when the user's action of “turning the line of sight” is detected by the camera 101 that the user has turned the line of sight toward the terminal device, and is given an attention score of 8.0. Is defined. In the illustrated example, since the attention score is defined in the range of 0 to 10, 8.0 means an expected value of relatively high attention. In addition, for example, various known techniques can be used for the image processing for detecting the user's line of sight executed in the expected attention value calculator 113, and thus detailed description thereof is omitted.

Further, for example, the record 131b is specified when the user's action of “calling” is detected when the user's utterance is detected by the microphone 105 and the utterance of another user is not detected, and the attention score of 9.0 is specified. Is defined to be given. For example, various known techniques can be used for the voice processing for identifying the user's utterance voice and the utterance voice of another user, which is executed in the expected attention value calculation unit 113, for example. Therefore, detailed description is omitted. In addition, for example, the expected attention value calculator 113 may detect the content of the user's utterance using various known techniques. In this case, for example, as a condition for specifying the user's action of “calling”, it may be defined that the user speaks a predetermined calling word, for example, “Hey”, “Hey”.

Hereinafter, in the same manner, the record 131c, the record 131d, the record 131e, and other records (not shown) are associated with the user action, the attention score, and the source and condition for specifying the action. For example, even if the user's utterance is detected by the microphone 105 as in the case of “call” of the record 131b, the record 131c may be “(Other Another action "conversation" with the user of the user is specified, and the attention score is lowered ("conversation" is 0.5 compared to 9.0 of "calling"). The record 131d indicates that the action “get on the train” is specified when the user's action “get on the train” is recognized based on the action recognition data provided from the action recognition server 111. Defined.

Further, the record 131e indicates that when the content of the information scheduled to be output by the information generation unit 117 includes a search keyword indicated by the search history acquired by the user action acquisition unit 107, the user action “search in the past” It defines what is identified. In the illustrated example, a record that associates an action related to information content with an expected value of attention, such as a record 131e, has a calculation rule DB 131 in a common format with data that associates another action with the expected value of attention. However, in another example, the data that associates the information content with the expected attention value is stored in the calculation rule DB 131 in a format different from the data that associates the action with the expected attention value. May be.

As described above, the expected attention value calculator 113 calculates the expected value of attention directed to information output to the user with reference to the calculation rule DB 131 as in the illustrated example. At this time, the expected attention value calculation unit 113 may use, for example, the attention score in the illustrated example as it is as the expected value of attention, or when multiple actions are detected in duplicate, The expected value of attention may be calculated by weighting and adding the power scores.

Furthermore, the expected attention value calculator 113 may estimate the user's action based on the action data and adjust the expected value of the attention based on the accuracy of the estimation. In this case, the expected value of attention may be adjusted to approach the average value if the estimation accuracy is low. For example, for an action with a higher attention score than the average value (5.0), for example, “turn gaze”, “call”, etc., it is determined that the accuracy of estimation based on action data is low (for example, In the result of the image analysis, the attention score may be temporarily lowered when the probability that the user is looking at the terminal device is dominant but not so high. For actions with an attention score lower than the average, such as “conversation” and “get on the train”, the attention score is temporarily raised when it is determined that the accuracy of the estimation based on the action data is low. May be. This is processing for estimating the reliability of the identified action low when the estimation accuracy is low, and bringing the calculated expected attention level close to that when no action is identified. In another example, when the accuracy of estimation is low, the attention score may be temporarily reduced uniformly.

A specific example of calculating the expected value of the user's attention in this embodiment will be further described. The following specific example may be realized, for example, by the processing logic of the attention expectation value calculation unit 113 or may be realized by data stored in the calculation rule DB 131.

For example, the expected attention value calculation unit 113 may raise the expected value of the calculated attention when the user's utterance (call) includes a specific word or phrase. More specifically, the expected value of the calculated attention is raised by combining utterances such as “Hurry”, “Hey”, and “Respond” with other actions (for example, turning the line of sight). In addition, the expected value of attention can be raised by a specific command (such as a command including the name of the system) or an action pointing at the terminal device. Using this, the user can control the system so that it is easy to respond to calls.

Also, for example, the expected attention value calculation unit 113 may calculate the expected value of attention based on the surrounding environment estimated as the user's action. More specifically, for example, when it is estimated that there is only one user, the expected attention value calculator 113 may raise the calculated expected value of attention. It is unlikely that you will speak to yourself except when you are alone or when you are on the phone, so when a user's utterance is detected, it is a call to the system compared to when there is another user. It is estimated that there is a high possibility. On the other hand, the attention expectation value calculation unit 113 may reduce the calculated expected value even when a user's utterance is detected in an environment where there is a lot of noise such as TV sound or train noise. For example, even if information is provided by voice through the speaker 125, it is unlikely that the user's attention is directed. However, when the sound output from the speaker 125 is sufficiently capable of beam forming, noise canceling, and the like, the expected value calculated may not be lowered even in a noisy environment.

Also, for example, the expected attention level calculator 113 may raise the calculated expected level of attention when the user continuously speaks the same content. In this case, it is highly likely that the user is requesting some kind of response from the system, so the user is expected to be attentive toward the information that is output, whether or not there was an audio output from the system before that. The value is estimated to be high.

Further, for example, the attention level expectation value calculation unit 113 calculates the expected level of attention level calculated based on the subsequent user action when one or more dialogues have already occurred between the system and the user. You may raise it. This is because user actions after interaction with the system are presumed to be likely related to information output from the system. However, for example, when the user leaves the terminal device that provides information after the dialogue, the expected value of attention can be low.

Further, for example, the expected attention value calculation unit 113 may raise the expected value of the calculated attention when the user has a characteristic dialogue with the system. More specifically, if you use a dialect, speak loudly, or add some keyword at the end or beginning, it is assumed that the user uttered with a feature that leads to the system, Expectation of attention can be high.

Also, for example, the expected attention value calculator 113 may calculate the expected value of attention for each output method according to the user's state. For example, when it is specified from the action recognition result that the user is working, in a meeting, or moving on a train, the expected attention value calculation unit 113 calculates the audio output. You may lower the expected value of attention. On the other hand, in this case, the expected attention level calculator 113 may increase the expected level of attention calculated for the user's gesture or simple action (for example, hitting or shaking the terminal device). If the user is sleeping, it is presumed that there is no intentional action from the user, so the expected value of the attention level calculated when any action is detected is reduced or set to 0 uniformly. May be. However, this is not the case when detecting a user's sleep log, for example, sleep phase, sleep, pulse, sleep level, and the like.

Also, for example, the expected attention level calculator 113 may correct the calculated expected level of attention according to the words included in the user's utterance. For example, the attention expectation value calculation unit 113 may select a specific person (for example, family, friend, company boss, etc.) or specific content (for example, anniversary, return date of borrowed book, date of submission of official documents) Etc.) may be included in the user's utterance, it may be determined that the user has a conversation with high importance, and the calculated expected value of attention may be raised. Thereby, for example, important information that the user should not forget can be reminded.

(3. Example of output control)
(3-1. Example of selection of display method)
FIG. 4 is a flowchart illustrating an example of selection of a display method according to an embodiment of the present disclosure. Referring to FIG. 4, the output control unit 115 first determines whether or not the expected attention value calculated by the expected attention value calculation unit 113 exceeds the first threshold th1 (S101). Here, if the expected value exceeds the first threshold th1, the output control unit 115 causes the display 123 to display information in a window displayed in the forefront (S103). This is processing when it is estimated that the user's attention to the output information is the highest (the user pays high attention to the output information). If information is displayed in the window displayed in the foreground, the user can obtain a lot of information immediately.

On the other hand, when the expected value of attention does not exceed the first threshold th1 in S101, the output control unit 115 further determines whether or not the expected value exceeds the second threshold th2 (S105). . The second threshold th2 is smaller than the first threshold th1. Here, if the expected value exceeds the second threshold th2, the output control unit 115 causes the display 123 to display information in a pop-up window (S107). This is processing when it is estimated that the user's attention to the output information is moderate (the user may or may not pay attention to the output information). If information is displayed in a pop-up window, even if the user does not need information, it does not get in the way.

On the other hand, if the expected value of attentiveness does not exceed the second threshold th2 in S105, the output control unit 115 ends the process without outputting information. That is, the output control unit 115 suppresses output of information. This is a process when it is assumed that the user's attention to the output information is low (the user pays little attention to the output information and may be an obstacle). Information that is not output in this case may be stored in the information cache DB 119 and output later.

(3-2. Example of selection of output method)
FIG. 5 is a flowchart illustrating an example of selecting an output method according to an embodiment of the present disclosure. Referring to FIG. 5, the output control unit 115 first determines whether or not the expected value of attention calculated by the expected attention value calculation unit 113 exceeds the first threshold th1 (S151). Here, if the expected value exceeds the first threshold th1, the output control unit 115 causes both the display 123 and the speaker 125 to output information (S153). This is processing when it is estimated that the user's attention to the output information is the highest (the user pays high attention to the output information). If information is output using both the display 123 and the speaker 125, the user can obtain a lot of information in a short time.

On the other hand, when the expected value of attention does not exceed the first threshold th1 in S151, the output control unit 115 further determines whether or not the expected value exceeds the second threshold th2 (S155). . The second threshold th2 is smaller than the first threshold th1. If the expected value exceeds the second threshold th2, the output control unit 115 outputs information using only the display 123 (S157). This is processing when it is estimated that the user's attention to the output information is moderate (the user may or may not pay attention to the output information). If information is output using only the display 123, even if the user does not need information, it does not get in the way.

On the other hand, when the expected value of attention does not exceed the second threshold th2 in S155, the output control unit 115 ends the process without outputting information. That is, the output control unit 115 suppresses output of information. This is a process when it is assumed that the user's attention to the output information is low (the user pays little attention to the output information and may be an obstacle). As in the example of FIG. 4, information that has not been output may be stored in the information cache DB 119 and output later.

FIG. 6 is a diagram showing the example of FIG. 5 more specifically. As shown in (a), the user has a conversation with another user. In this case, the expected attention value calculation unit 113 identifies the user's action based on, for example, voice data acquired by the microphone 105, and compares the data with reference to the data of the calculation rule DB 131 as illustrated in FIG. Low expected value. In the illustrated example, this expected value is between the first threshold th1 and the second threshold th2. Therefore, the process of S157 in the flowchart of FIG. 5 is executed, and information is output using only the display 123, as shown in FIG.

Here, the user is interested in the information displayed on the display 123, and as shown in (c), calls the terminal device “Show me more!”. The attention expectation value calculation unit 113 identifies a user action based on, for example, audio data acquired by the microphone 105, and similarly calculates a relatively high expectation value by referring to the data in the rule DB 131. In the illustrated example, this expected value exceeds the first threshold th1. Therefore, the process of S153 in the flowchart of FIG. 5 is executed, and information is output using both the display on the display 123 and the sound 125v output from the speaker 125, as shown in (c).

In the example of FIG. 6, information corresponding to the content of the user's conversation (about Italian restaurants) is output. Such information is generated by, for example, the information generation unit 117 specifying the user's utterance content based on the voice data acquired by the microphone 105 and acquiring information related to the utterance content from the information server 121. The Since various known techniques can be used for voice processing for specifying the utterance content, detailed description thereof is omitted. In this case, the expected attention level calculator 113 may raise the calculated expected level of attention by including the content of the information to be output in the user's utterance content.

(3-3. Other examples)
FIG. 7 is a diagram illustrating an example of display color selection according to an embodiment of the present disclosure. Referring to FIG. 7, a user is walking in the city wearing a wearable terminal device on a bracelet. Here, the user passed near a certain store (SHOP). This store was a store (Italian restaurant) related to the search keyword (referred to as “Italian”) in the information search executed by the user before. In this case, for example, the information generation unit 117 is estimated from the user position information specified by the GPS receiver included in the sensor 103 and the user information search history previously acquired by the action data acquisition unit 107. Based on the relationship between the store (SHOP) and the user (the store may be an object of interest of the user), information for notifying the user that the store (SHOP) is nearby is generated. Alternatively, the relationship between the store (SHOP) and the user may be estimated using user profile information held by an external service. For example, in a restaurant information providing service of a restaurant, store information bookmarks, store information search history, store information registered by other users having similar attributes, and the like are stored. In the social media service, evaluation information and the like expressed by the user on the social media for the service and the store are held. The information generation unit 117 estimates the relationship between the store (SHOP) and the user based on such information, for example, and further notifies the user that the store (SHOP) is nearby based on the user location information. Information may be generated.

In the illustrated example, notification information to the user is output by illumination included in the other output device 127. The output control unit 115 may change the illumination display color as shown in FIGS. 7A to 7C according to the expected attention value calculated by the expected attention value calculation unit 113. Good. For example, the output control unit 115 may emit the illumination with a conspicuous color when the expected value of the user's attention is high, or may emit the plain color or not when the expected value is low. Alternatively, when the expected value of the user's attention is high (since there is a high possibility that the user has already noticed), the output control unit 115 emits illumination with a plain color, and the expected value of the user's attention is If it is low (it is likely that the user has not yet noticed), the illumination may be emitted with a conspicuous color.

(4. Example of information stock)
Subsequently, an example of information stock according to an embodiment of the present disclosure will be described with reference to FIGS. 8 and 9. In FIGS. 8 and 9 below, the state of communication with the user detected by the system is represented by an indicator as shown in the lower right of the figure. The indicator may be actually displayed by, for example, an illumination provided as the other output device 127 in the terminal device, or interpreted as a display for explanation in FIGS. 8 and 9 (not actually displayed). May be.

FIG. 8 is a diagram for describing a first example of information stock according to an embodiment of the present disclosure. In the example of FIG. 8, as shown in FIG. 8A, while the user speaks “What is today's schedule?”, The system detects that the user is speaking. Here, as shown in (b), the system did not correctly detect the content of the user's utterance, and the expected value of the user's attention to the output information was calculated low. I didn't output the information and stocked it. Here, the user notices that there is no response from the system, and calls him “Ooi”. As shown in (c), a system that correctly detects a call from a user estimates that the expected value of the user's attention is high, and outputs stocked information. More specifically, the system provides information to the user by the voice 125v output from the speaker 125, "I'm sorry. Today is lunch in Osaki."

FIG. 9 is a diagram for describing a second example of information stock according to an embodiment of the present disclosure. In the example of FIG. 9, as shown in FIG. 9A, while the user speaks “What is today's schedule?”, The system detects that the user is speaking. Furthermore, as shown in (b), since another user who is conversing with the user replied "I'm free", the system is in the user's conversation with another user. It was estimated that the expected value of attention was low, and the information generated by the information generation unit 117 was not output and stocked. Unlike the example in Figure 8, the user did not actually need information from the system (it was just asking for today's schedule in a conversation with another user), so the system stocked the information. The decision was correct. Thereafter, when a predetermined time has elapsed, as shown in (c), the system discards the stocked information as unnecessary and returns to a steady state. Note that stocking without destroying information at the time of (b) is when it is falsely detected that the user is talking to another user, and the user actually needs the information. This is to maintain a state where information can be output (when called as shown in FIG. 8B).

FIG. 10 is a flowchart showing processing for stocking information according to an embodiment of the present invention. Referring to FIG. 10, the output control unit 115 first determines whether or not the expected value of attention calculated by the expected attention value calculation unit 113 exceeds the first threshold th1 (S201). Here, if the expected value exceeds the first threshold th1, the output control unit 115 further determines whether there is information stocked in the information cache DB 119 (S203). Here, when there is stocked information, the output control unit 115 outputs the stocked information (S205). This is, for example, the process shown in (c) in the example of FIG. Subsequently, if there is other information (may be newer information) generated by the information generation unit 117, the output control unit 115 outputs the information (S207).

On the other hand, when the expected value of attention does not exceed the first threshold th1 in S201, the output control unit 115 further determines whether or not the expected value exceeds the second threshold th2 (S209). . The second threshold th2 is smaller than the first threshold th1. Here, if the expected value exceeds the second threshold th2, the output control unit 115 outputs the information generated by the information generation unit 117 (S207). That is, in the illustrated example, when the expected value of attention is between the first threshold value th1 and the second threshold value th2, the stocked information is not output, but for example, a new information generation unit 117 is generated. The information generated by is output. On the other hand, if the expected value of attention does not exceed the second threshold th2 in S209, the output control unit 115 stocks information (S211).

It should be noted that the information stock example described above can be variously modified. For example, if the information is stocked (e.g., inadvertently), the actions for the user to retrieve the stocked information from the system are, for example, calling, saying the same thing again, staring (looking at), turning the face, There are possible operations such as operating buttons on the terminal device, clapping hands, and shutting down (waiting for a response from the system). In the system, these actions may be registered as actions for extracting the stocked information. In this case, when outputting the stocked information, the system may add an apology message for not responding correctly (for example, the system response in (c) in the example of FIG. 8). Also, as a step before displaying the stock information, it shows the actions of the recognized user (such as “Is n’t you talking to other users?”) And the user still provides the information The stocked information may be output when requested.

Also, there may be a plurality of stocked information. In that case, for example, the stock information may be displayed in a list on the display 123, and the user may select which information to output. The stocked information can be discarded when a predetermined time has passed as in the example of FIG. 9 described above, but the time for that can be arbitrarily set. For example, depending on the content of the information, a time from several minutes to several hours or days may be set as the time until the stocked information is discarded.

(5. Hardware configuration)
Next, a hardware configuration of the information processing apparatus according to the embodiment of the present disclosure will be described with reference to FIG. FIG. 11 is a block diagram illustrating a hardware configuration example of the information processing apparatus according to the embodiment of the present disclosure. The illustrated information processing apparatus 900 can realize, for example, the terminal device or server in the above-described embodiment.

The information processing apparatus 900 includes a CPU (Central Processing unit) 901, a ROM (Read Only Memory) 903, and a RAM (Random Access Memory) 905. The information processing apparatus 900 may include a host bus 907, a bridge 909, an external bus 911, an interface 913, an input device 915, an output device 917, a storage device 919, a drive 921, a connection port 923, and a communication device 925. Furthermore, the information processing apparatus 900 may include an imaging device 933 and a sensor 935 as necessary. The information processing apparatus 900 may include a processing circuit called DSP (Digital Signal Processor) or ASIC (Application Specific Integrated Circuit) instead of or in addition to the CPU 901.

The CPU 901 functions as an arithmetic processing device and a control device, and controls all or a part of the operation in the information processing device 900 according to various programs recorded in the ROM 903, the RAM 905, the storage device 919, or the removable recording medium 927. The ROM 903 stores programs and calculation parameters used by the CPU 901. The RAM 905 primarily stores programs used in the execution of the CPU 901, parameters that change as appropriate during the execution, and the like. The CPU 901, the ROM 903, and the RAM 905 are connected to each other by a host bus 907 configured by an internal bus such as a CPU bus. Further, the host bus 907 is connected to an external bus 911 such as a PCI (Peripheral Component Interconnect / Interface) bus via a bridge 909.

The input device 915 is a device operated by the user, such as a mouse, a keyboard, a touch panel, a button, a switch, and a lever. The input device 915 may be, for example, a remote control device that uses infrared rays or other radio waves, or may be an external connection device 929 such as a mobile phone that supports the operation of the information processing device 900. The input device 915 includes an input control circuit that generates an input signal based on information input by the user and outputs the input signal to the CPU 901. The user operates the input device 915 to input various data and instruct processing operations to the information processing device 900.

The output device 917 is a device that can notify the user of the acquired information visually or audibly. The output device 917 can be, for example, a display device such as an LCD (Liquid Crystal Display), a PDP (Plasma Display Panel), an organic EL (Electro-Luminescence) display, an audio output device such as a speaker and headphones, and a printer device. . The output device 917 outputs the result obtained by the processing of the information processing device 900 as video such as text or an image, or outputs it as audio such as voice or sound.

The storage device 919 is a data storage device configured as an example of a storage unit of the information processing device 900. The storage device 919 includes, for example, a magnetic storage device such as an HDD (Hard Disk Drive), a semiconductor storage device, an optical storage device, or a magneto-optical storage device. The storage device 919 stores programs executed by the CPU 901, various data, various data acquired from the outside, and the like.

The drive 921 is a reader / writer for a removable recording medium 927 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory, and is built in or externally attached to the information processing apparatus 900. The drive 921 reads information recorded on the attached removable recording medium 927 and outputs the information to the RAM 905. In addition, the drive 921 writes a record in the attached removable recording medium 927.

The connection port 923 is a port for directly connecting a device to the information processing apparatus 900. The connection port 923 can be, for example, a USB (Universal Serial Bus) port, an IEEE 1394 port, a SCSI (Small Computer System Interface) port, or the like. The connection port 923 may be an RS-232C port, an optical audio terminal, an HDMI (registered trademark) (High-Definition Multimedia Interface) port, or the like. By connecting the external connection device 929 to the connection port 923, various types of data can be exchanged between the information processing apparatus 900 and the external connection device 929.

The communication device 925 is a communication interface configured with, for example, a communication device for connecting to the communication network 931. The communication device 925 may be, for example, a communication card for wired or wireless LAN (Local Area Network), Bluetooth (registered trademark), or WUSB (Wireless USB). The communication device 925 may be a router for optical communication, a router for ADSL (Asymmetric Digital Subscriber Line), or a modem for various communication. The communication device 925 transmits and receives signals and the like using a predetermined protocol such as TCP / IP with the Internet and other communication devices, for example. The communication network 931 connected to the communication device 925 is a wired or wireless network, such as the Internet, a home LAN, infrared communication, radio wave communication, or satellite communication.

The imaging device 933 uses various members such as an imaging element such as a CCD (Charge Coupled Device) or a CMOS (Complementary Metal Oxide Semiconductor), and a lens for controlling the formation of a subject image on the imaging element. It is an apparatus that images a real space and generates a captured image. The imaging device 933 may capture a still image or may capture a moving image.

The sensor 935 is various sensors such as an acceleration sensor, a gyro sensor, a geomagnetic sensor, an optical sensor, and a sound sensor. The sensor 935 acquires information about the state of the information processing apparatus 900 itself, such as the posture of the information processing apparatus 900, and information about the surrounding environment of the information processing apparatus 900, such as brightness and noise around the information processing apparatus 900, for example. To do. The sensor 935 may include a GPS sensor that receives a GPS (Global Positioning System) signal and measures the latitude, longitude, and altitude of the apparatus.

Heretofore, an example of the hardware configuration of the information processing apparatus 900 has been shown. Each component described above may be configured using a general-purpose member, or may be configured by hardware specialized for the function of each component. Such a configuration can be appropriately changed according to the technical level at the time of implementation.

(6. Supplement)
Embodiments of the present disclosure include, for example, an information processing device (terminal device or server) as described above, a system, an information processing method executed by the information processing device or system, a program for causing the information processing device to function, And a non-transitory tangible medium on which the program is recorded.

The preferred embodiments of the present disclosure have been described in detail above with reference to the accompanying drawings, but the technical scope of the present disclosure is not limited to such examples. It is obvious that a person having ordinary knowledge in the technical field of the present disclosure can come up with various changes or modifications within the scope of the technical idea described in the claims. Of course, it is understood that it belongs to the technical scope of the present disclosure.

In addition, the effects described in this specification are merely illustrative or illustrative, and are not limited. That is, the technology according to the present disclosure can exhibit other effects that are apparent to those skilled in the art from the description of the present specification in addition to or instead of the above effects.

The following configurations also belong to the technical scope of the present disclosure.
(1) Acquire data indicating user actions,
Based on the acquired data, calculate an expected value of attention directed to information output to the user,
An information processing apparatus comprising a processor configured to provide the expected value for output control of the information.
(2) The information processing apparatus according to (1), wherein the processor calculates the expected value based on a latest action of the user.
(3) The information processing apparatus according to (1) or (2), wherein the processor calculates the expected value based on a history of actions of the user.
(4) The information processing apparatus according to any one of (1) to (3), wherein the user action includes a motion or a facial expression of the user.
(5) The information processing apparatus according to (4), wherein the user action includes a reaction to the already output information.
(6) The information processing apparatus according to (5), wherein the processor is further configured to correct the expected value calculation rule based on data indicating the reaction.
(7) The information processing apparatus according to any one of (1) to (6), wherein the expected value is provided to determine whether or not to output the information.
(8) The information according to (7), wherein the processor further executes output control of the information, and outputs the information that was not output because the expected value is low when the expected value is high Processing equipment.
(9) The information processing apparatus according to any one of (1) to (8), wherein the expected value is provided to select an output method of the information.
(10) The processor according to any one of (1) to (9), wherein the processor estimates the action of the user based on the acquired data and adjusts the expected value based on the accuracy of the estimation. The information processing apparatus described in 1.
(11) The information processing apparatus according to (10), wherein the processor brings the expected value closer to an average value when the estimation accuracy is low.
(12) The information processing apparatus according to any one of (1) to (11), wherein the processor increases the expected value when an utterance of a specific phrase is included in the user action.
(13) The information processing apparatus according to any one of (1) to (12), wherein the processor calculates the expected value based on a user's surrounding environment estimated as the user's action.
(14) The processor
Get data indicating user actions,
Based on the acquired data, calculate an expected value of attention directed to information output to the user,
An information processing method including providing the expected value for output control of the information.
(15) Acquire data indicating user actions,
Based on the acquired data, calculate an expected value of attention directed to information output to the user,
A program for causing a computer to realize a function of providing the expected value for output control of the information.

10 System 101 Camera 103 Sensor 105 Microphone 107 Action Data Acquisition Unit 109 Action DB
113 Attentiveness Expected Value Calculation Unit 115 Output Control Unit 117 Information Generation Unit 119 Information Cache DB
123 Display 125 Speaker 127 Other output device 129 Feedback analysis unit 131 Calculation rule DB

Claims

Get data indicating user actions,
Based on the acquired data, calculate an expected value of attention directed to information output to the user,
An information processing apparatus comprising a processor configured to provide the expected value for output control of the information.
The information processing apparatus according to claim 1, wherein the processor calculates the expected value based on the latest action of the user.
The information processing apparatus according to claim 1, wherein the processor calculates the expected value based on a history of actions of the user.
The information processing apparatus according to claim 1, wherein the user action includes a motion or a facial expression of the user.
5. The information processing apparatus according to claim 4, wherein the user action includes a reaction to the already output information.
6. The information processing apparatus according to claim 5, wherein the processor is further configured to correct the expected value calculation rule based on data indicating the reaction.
The information processing apparatus according to claim 1, wherein the expected value is provided to determine whether or not to output the information.
The information processing apparatus according to claim 7, wherein the processor further executes output control of the information and outputs the information that was not output because the expected value is low when the expected value is high.
The information processing apparatus according to claim 1, wherein the expected value is provided for selecting an output method of the information.
The information processing apparatus according to claim 1, wherein the processor estimates an action of the user based on the acquired data, and adjusts the expected value based on the accuracy of the estimation.
The information processing apparatus according to claim 10, wherein the processor approaches the expected value to an average value when the estimation accuracy is low.
The information processing apparatus according to claim 1, wherein the processor increases the expected value when an utterance of a specific phrase is included in the user action.
The information processing apparatus according to claim 1, wherein the processor calculates the expected value based on a user's surrounding environment estimated as the user's action.
Processor
Get data indicating user actions,
Based on the acquired data, calculate an expected value of attention directed to information output to the user,
An information processing method including providing the expected value for output control of the information.
Get data indicating user actions,
Based on the acquired data, calculate an expected value of attention directed to information output to the user,
A program for causing a computer to realize a function of providing the expected value for output control of the information.